Aim: develop software approaches to improving the trustworthiness of commodity off-the-shelf (COTS) hardware.
Overview: The formal verification of the seL4 microkernel has produced an operating system that is guaranteed mathematically to be correct and enforce isolation. The seL4 microkernel can consolidate safety and security critical software on a single device where previously multiple devices were used to provide air gap isolation. One of the barriers to wide-spread consolidation on commodity hardware is the lack of hardware dependability features. A hardware fault triggered by cosmic rays, alpha particle strikes, etc. could invalidate the strong mathematical guarantees that are based on the hardware behaving correctly, and pierce the virtual air gap seL4 provides.
This project aims to improve the trustworthiness of commodity hardware to enable a verified microkernel to be used in situations previously needing an air gap. We explore leveraging multicore processors to provide redundancy, and thus increase the trustworthiness of a virtual air gap, and thus systems running on COTS hardware.
The key challenge of the approach is to provide improved trustworthiness in the presence of transient hardware faults that also manifest themselves in operating system itself, while retaining good performance. A typical COTS hardware platform also features non-redundant devices, which creates another challenge of improving trustworthiness in the presence of non-redundant device drivers interacting with a redundant overall system.
Kernel implementation and mechanisms for improved trustworthiness: We are evolving seL4 and its API to support both improved trustworthiness through redundant software execution on multicore processors. The activity includes:
Contact: Kevin Elphinstone, kevin.elphinstonedata61.csiro.au
The project has uccessfully created a version of seL4 that provides redundant co-execution (RCoE) of an seL4-based system on COTS multicore, i.e. without any non-standard hardware support. It can run in dual or triple modular redundancy (DMR or TMR) configuration.
The replication and fault-tolerance implementation is done at the kernel level, meaning that the whole system is replicated, except for the minimal interfaces to non-replicated devices (mostly device register reads and writes). Our RCoE implementation is the first system that achieves such maximal replication on COTS hardware.
Our RCoE setup comes in two versions:
Both versions are share the following properties:
Having demonstrated the above, the projectis concluded for now. We will be happy to revive it should there be commercial interest.
Coming soon!
Past
|
![]() |
![]() |
Yanyan Shen, Gernot Heiser and Kevin Elphinstone Fault tolerance through redundant execution on COTS multicores: Exploring trade-offs International Conference on Dependable Systems and Networks (DSN), pp. 188-200, Portland, Oregon, USA, June, 2019 |
![]() |
![]() |
Yanyan Shen Microkernel mechanisms for improving the trustworthiness of commodity hardware PhD Thesis, UNSW, Sydney, Australia, March, 2019 |
![]() |
![]() |
Yanyan Shen and Kevin Elphinstone Microkernel mechanisms for improving the trustworthiness of commodity hardware European Dependable Computing Conference, pp. 12, Paris, France, September, 2015 |
![]() |
![]() |
Kevin Elphinstone and Yanyan Shen Improving the trustworthiness of commodity hardware with software Workshop on Dependability of Clouds, Data Centers and Virtual Machine Technology (DCDV), pp. 6, Budapest, Hungary , June, 2013 |