Fault tolerance through redundant execution on COTS multicores: Exploring trade-offs
Authors
DATA61
UNSW Sydney
Abstract
High availability and integrity are paramount in systems deployed in life- and mission-critical scenarios. Such fault-tolerance can be achieved through redundant co-execution (RCoE) on replicated hardware, now cheaply available with multicore processors. RCoE replicates almost all software, including OS kernel, drivers, and applications, achieving a sphere of replication that covers everything except the minimal interfaces to non-replicated peripherals. We complement our original, loosely-coupled RCoE by a closely-coupled version that improves transparency of replication to application code, and investigate the functionality, performance, and vulnerability trade-offs.
BibTeX Entry
@inproceedings{Shen_HE_19, address = {Portland, Oregon, USA}, author = {Shen, Yanyan and Heiser, Gernot and Elphinstone, Kevin}, booktitle = {International Conference on Dependable Systems and Networks (DSN)}, date = {2019-6-24}, doi = {https://doi.org/10.1109/DSN.2019.00031}, issn = {1530-0889}, keywords = {{seL4}; microkernel; {SEU}; replication; fault tolerance}, month = jun, pages = {188-200}, paperurl = {https://trustworthy.systems/publications/full_text/Shen_HE_19.pdf}, publisher = {IEEE}, title = {Fault Tolerance Through Redundant Execution on {COTS} Multicores: Exploring Trade-offs}, year = {2019} }