Microkernel Performance Metrics
Traditionally, the performance metric people care most with microkernels is the basic IPC cost, i.e. the cost for sending a single, minimal-length message between two threads. While it can be debated how meaningful such a microbenchmark is, IPC is the most critical operation of a microkernel.
All figures shown here are measured under the following circumstances:
- hot cache (tight loop)
- one way (half of round-trip)
- inter address space (requiring a full switch of addressing context)
Data
Kernel | Architecture | Processor | Clock | Latency | Time | Measured | Source | |
---|---|---|---|---|---|---|---|---|
MHz | Cycles | ns | by | year | ||||
seL4 | Intel x86 (32 bit) | Core i7 4770 (Haswell) | 3,400 | 301 | 89 | NICTA | 2013 | SOSP'13 |
seL4 | ARM v7 | Cortex A9 | 1,000 | 316 | 316 | NICTA | 2013 | SOSP'13 |
seL4 | ARMv6 | ARM11 | 532 | 188 | 353 | NICTA | 2013 | SOSP'13 |
NOVA(1) | x86 (32 bit) | Core i7 920 (Bloomfield) | 2,667 | 288 | 108 | TU Dresden | 2010 | EuroSys'10 |
okL4 | ARMv5 | XScale 255 | 400 | 151 | 378 | NICTA | 2007 | NICTA |
okL4 | ARM v4 | StrongARM SA1100 | 206 | 131 | 635 | NICTA | 2007 | NICTA |
L4KA:Pistachio | IA64 | Itanium 2 | 1,500 | 36 | 24 | NICTA | 2005 | Usenix'05 |
L4KA:Pistachio | x86 (64 bit) | AMD Opteron 242 | 1,600 | 230 | 144 | University of Karlsruhe | 2004? | L4Ka site |
Hazelnut | x86 (32 bit) | Pentium 4 | 1,400 | 1,008 | 720 | University of Karlsruhe | 2002 | L4Ka site |
Hazelnut | x86 (32 bit) | Pentium II | 400 | 273 | 683 | University of Karlsruhe | 2002 | L4Ka site |
L4/MIPS(2) | MIPS64 | R4700 | 100 | 86 | 860 | UNSW | 1997 | HotOS'97 |
L4/Alpha(2) | Alpha | 21064 | 433 | 45 | 104 | TU Dresden | 1997 | HotOS'97 |
Original | x86 | Pentium | 166 | 121 | 756 | Liedtke | 1997 | HotOS'97 |
Original | x86 | i486 | 50 | 250 | 5,000 | Liedtke | 1993 | SOSP'93 |
Notes:
- NOVA does not support the standard GCC version of thread local storage (using segment registers). Instead it supports TLS via a general-purpose register (compiler options -ffixed-rbp and -ftls-model=global-dynamic). This reduces context-switch costs at the expense of slightly reduced application performance.
- The MIPS and Alpha figures published in HotOS'97 were for incomplete kernel implementations. Actual costs for the fully functional kernels were higher (100 cycles on MIPS and 70–80 cycles on Alpha).