L4 performance

The tables below show the message-passing IPC costs (half of one round trip) for various CPUs and L4 configurations. Times are clock cycles spent in the kernel (including the overhead for entering and exiting kernel mode). Times are given for various payload sizes (number of message registers used, each is one machine word).

These results are from our L4 version called NICTA::Pistachio-embedded and our para-virtualized Linux called Wombat. Both systems are no longer supported, as they have been superseded by OKL4 and OK Linux from Open Kernel Labs. The OK Labs systems have been developed further, including further improvements to the performance.

NICTA::Pistachio-embedded IPC

ARM (5 physical message registers)

These numbers are for IPC between separate address spaces (including context switch).

Processor	Speed	0..4 MRs	8 MRs	16 MRs
XScale PXA255 (ARMv5)	400Mhz	151	188	228
StrongARM SA1100 (ARMv4)	206Mhz	131	141	161

Fast address-space switching (FASS, aka FCSE) is enabled.

Wombat: para-virtualised Linux

Wombat, our architecture-independent para-virtualised Linux for L4-embedded, runs on ARM, x86 and MIPS. On ARM v4 or v5 processors, such as ARM9 cores or the XScale, Wombat benefits from the fast address-space switch (FASS) technology implemented in L4-embedded, while this is not supported in native Linux distributions.

Wombat ARM: XScale PXA255 @ 200Mhz

These numbers are LMBench results for native ARM Linux and Wombat on the PLEB2 reference platform.

LMBench latency test results. Times are in microseconds. Lower is better.

Latencies	Linux	Wombat	Rel. Perf	Comment
lat_ctx -s 0 2	190.8	6.48	29.44	Context switch latencies
lat_ctx -s 0 3	197.1	18.82	10.47
lat_ctx -s 0 4	199.5	19.78	10.09
lat_ctx -s 0 10	215.7	44.07	4.89
lat_ctx -s 4 2	257.7	7.15	36.04
lat_ctx -s 4 3	259.3	23.26	11.15
lat_ctx -s 4 4	293.4	40.28	7.28
lat_ctx -s 4 4	285.1	141.96	2.01
lat_fifo	377.0	80.07	4.71	Hot potato
lat_pipe	378.4	81.56	4.64
lat_unix	764.5	107.48	7.11
lat_syscall null	0.82	4.73	0.17
lat_proc procedure	0.21	0.21	1.00	Process creation
lat_proc fork	4334	5706	0.76
lat_proc exec	4600	6400	0.72

LMBench bandwidth test results, in MB/s. Higher is better.

Bandwidth	Linux	Wombat	Relative Performance
bw_file_rd 1024 io_only	39.38	12.43	0.32
bw_mmap_rd 1024	106.7	106.1	0.99
bw_mem 1024	416.0	416.0	1.00
bw_mem_wr	229.9	229.0	1.00
bw_pipe	10.15	15.31	1.51
bw_unix	24.23	11.32	0.47

The following numbers are the results for the AIM7 benchmark on native Linux and Wombat. Units are in Jobs/min/task. Higher is better.

Metric	Linux	Wombat	Relative Performance
1 Task	47.52	46.32	0.97
2 Tasks	24.77	24.12	0.97
3 Tasks	16.74	16.31	0.97

Despite the increased security and isolation benefits of virtualizing Linux, virtualized Linux on ARM can show clear performance gains over native Linux in many areas, particularily because of its use of fast address-space switching (FASS).

Trustworthy Systems

L4 performance

NICTA::Pistachio-embedded IPC

ARM (5 physical message registers)

Wombat: para-virtualised Linux

Wombat ARM: XScale PXA255 @ 200Mhz