Fault tolerance through redundant execution on COTS multicores: Exploring trade-offs
Authors
DATA61\ UNSW Sydney
Abstract
High availability and integrity are paramount in systems deployed in life- and mission-critical scenarios. Such fault-tolerance can be achieved through redundant co-execution (RCoE) on replicated hardware, now cheaply available with multicore processors. RCoE replicates almost all software, including OS kernel, drivers, and applications, achieving a sphere of replication that covers everything except the minimal interfaces to non-replicated peripherals. We complement our original, loosely-coupled RCoE by a closely-coupled version that improves transparency of replication to application code, and investigate the functionality, performance, and vulnerability trade-offs.
BibTeX Entry
@inproceedings{Shen_HE_19,
address = {Portland, Oregon, USA},
author = {Shen, Yanyan and Heiser, Gernot and Elphinstone, Kevin},
booktitle = {International Conference on Dependable Systems and Networks (DSN)},
date = {2019-6-24},
doi = {https://doi.org/10.1109/DSN.2019.00031},
issn = {1530-0889},
keywords = {{seL4}; microkernel; {SEU}; replication; fault tolerance},
month = jun,
pages = {188-200},
paperurl = {https://trustworthy.systems/publications/full_text/Shen_HE_19.pdf},
publisher = {IEEE},
title = {Fault Tolerance Through Redundant Execution on {COTS} Multicores: Exploring Trade-offs},
year = {2019}
}
Full text
BibTeX