L4 performance
The tables below show the message-passing IPC costs (half of one round trip) for various CPUs and L4 configurations. Times are clock cycles spent in the kernel (including the overhead for entering and exiting kernel mode). Times are given for various payload sizes (number of message registers used, each is one machine word).
These results are from our L4 version called NICTA::Pistachio-embedded and our para-virtualized Linux called Wombat. Both systems are no longer supported, as they have been superseded by OKL4 and OK Linux from Open Kernel Labs. The OK Labs systems have been developed further, including further improvements to the performance.
NICTA::Pistachio-embedded IPC
ARM (5 physical message registers)
These numbers are for IPC between separate
address spaces (including context switch).
Processor | Speed | 0..4 MRs | 8 MRs | 16 MRs |
---|---|---|---|---|
XScale PXA255 (ARMv5) | 400Mhz | 151 | 188 | 228 |
StrongARM SA1100 (ARMv4) | 206Mhz | 131 | 141 | 161 |
Fast address-space switching (FASS, aka FCSE) is enabled.
Wombat: para-virtualised Linux
Wombat, our architecture-independent para-virtualised Linux for L4-embedded, runs on ARM, x86 and MIPS. On ARM v4 or v5 processors, such as ARM9 cores or the XScale, Wombat benefits from the fast address-space switch (FASS) technology implemented in L4-embedded, while this is not supported in native Linux distributions.
Wombat ARM: XScale PXA255 @ 200Mhz
These numbers
are LMBench results
for native ARM Linux and Wombat on
the PLEB2 reference platform.
LMBench latency test results. Times are in
microseconds. Lower is better.
Latencies | Linux | Wombat | Rel. Perf | Comment |
---|---|---|---|---|
lat_ctx -s 0 2 | 190.8 | 6.48 | 29.44 | Context switch latencies |
lat_ctx -s 0 3 | 197.1 | 18.82 | 10.47 | |
lat_ctx -s 0 4 | 199.5 | 19.78 | 10.09 | |
lat_ctx -s 0 10 | 215.7 | 44.07 | 4.89 | |
lat_ctx -s 4 2 | 257.7 | 7.15 | 36.04 | |
lat_ctx -s 4 3 | 259.3 | 23.26 | 11.15 | |
lat_ctx -s 4 4 | 293.4 | 40.28 | 7.28 | |
lat_ctx -s 4 4 | 285.1 | 141.96 | 2.01 | |
lat_fifo | 377.0 | 80.07 | 4.71 | Hot potato |
lat_pipe | 378.4 | 81.56 | 4.64 | |
lat_unix | 764.5 | 107.48 | 7.11 | |
lat_syscall null | 0.82 | 4.73 | 0.17 | |
lat_proc procedure | 0.21 | 0.21 | 1.00 | Process creation |
lat_proc fork | 4334 | 5706 | 0.76 | |
lat_proc exec | 4600 | 6400 | 0.72 |
LMBench bandwidth test results, in MB/s. Higher is better.
Bandwidth | Linux | Wombat | Relative Performance |
---|---|---|---|
bw_file_rd 1024 io_only | 39.38 | 12.43 | 0.32 |
bw_mmap_rd 1024 | 106.7 | 106.1 | 0.99 |
bw_mem 1024 | 416.0 | 416.0 | 1.00 |
bw_mem_wr | 229.9 | 229.0 | 1.00 |
bw_pipe | 10.15 | 15.31 | 1.51 |
bw_unix | 24.23 | 11.32 | 0.47 |
The following numbers are the results for the AIM7 benchmark on native Linux and Wombat. Units are in Jobs/min/task. Higher is better.
Metric | Linux | Wombat | Relative Performance |
---|---|---|---|
1 Task | 47.52 | 46.32 | 0.97 |
2 Tasks | 24.77 | 24.12 | 0.97 |
3 Tasks | 16.74 | 16.31 | 0.97 |
Despite the increased security and isolation benefits of virtualizing Linux, virtualized Linux on ARM can show clear performance gains over native Linux in many areas, particularily because of its use of fast address-space switching (FASS).