Trustworthy Systems

Mitigating microarchitectural timing channels with hardware-provided operations

This page shows data gathered from a quantitative study of timing channels exploiting microarchitectural state on multiple generation of Intel and ARM processors. We measure channel capacities of the raw channels, i.e., without any countermeasures applied, as well as with all architected scrubbing measures (including disabling the data prefetcher).

In the wake of the Spectre attack, Intel released a microcode update with a suite of defence mechanisms, collectively referred as indirect branch control (IBC). As IBC performs a degree of sanitising of the branch predictor, this could be a possible mechanism for eliminating the channels we observe. We therefore include the IBC as part of the hardware scrubbing measures on the tested x86 platforms.

Leakage is evaluated in a covert channel scenario. There are two security domains which time-share a processor core. One domain, Hi contains secrets and is confined by the system's security policy, meaning that the secrets must not leak outside. A second domain, Low, can communicate freely with the outside world. The attacker controls a spy process running in Low and has managed to place a Trojan in Hi. The Trojan has access to the secret and is actively trying to leak it via the channel under investigation.

Channels

We investigate five microarchitectural timing channels:

  1. L1-D-cache channel
  2. L1-I-cache channel
  3. TLB channel (not demonstrated previously on a tagged TLB)
  4. Branch-target buffer (BTB) channel (not demonstrated previously)
  5. Branch-history buffer (BHB) channel

We also examine the mitigated timing channel without Intel's Spectre fix for comparing the effectiveness of Intel's IBC mechanism.

The implementation of the channels is specific to the exploited microarchitectural feature, and explained in the respective sections. What is common is that we use prime-and-probe attacks, where the spy primes the microarchitectural component to force it into a defined state, waits until the Trojan executed for a time slice, and then probes the state for the effect of the Trojan's execution.

For example, the spy primes the D-cache by traversing a buffer large enough to fill it with its own data. The Trojan touches its own buffer, to replace a specific number of cache lines with its own data, this number is the input symbol (s) of the channel. When probing, the spy measures its own execution time, which depends on the number of cache lines replaced by the Trojan; the measured probe time is the output symbol. In the absence of a channel, the output symbol will be uncorrelated to the input symbol.

Hardware Platforms

Platforms used for evaluation
Name Haswell Skylake Sabre Hikey
Architecture x86 x86 ARMv7 ARMv8
Microarchitecture / Core Haswell Skylake Cortex A9 Cortex A53
Processor / SoC i7-4700 i7-6700 i.MX6 Kirin 620
Vintage 2013 2015 2011 2015
Clock rate 3.4 GHz 3.4 GHz 0.8 GHz 1.2 GHz
Execution mode 64 bit 64 bit 32 bit 32 bit
Time slice 1 ms 1 ms 1 ms 1 ms
L1-D size 32 KiB 32 KiB 32 KiB 32 KiB
L1-D sets × associativity 64×8 64×8 256×4 128×4
L1-I size 32 KiB 32 KiB 32 KiB 32 KiB
L1-I sets × associativity 64×8 64×8 256×4 256×2
I+D TLB size (entries) 64+64 128+64 32+32 10+10
Unified TLB size (entries) 1024 1536 128 512
BTB size (entries) ??? ??? 512 256

Results

For each channel and evaluation platform we show the channel matrix, which represents the conditional probability of observing an output symbol (vertical axis) given an input symbol (horizontal axis). We represent the channel matrix as a heat map, brighter colours indicating a higher probability.

In the absence of a channel, the output symbol should be uncorrelated to the input symbol, which will result in a channel matrix without any horizontal variation. Any such variation is a clear indication of a channel.

We quantify the channel by computing the mutual information (MI), 𝓜, of the channel, which represents the average number of bits of information a computationally-unlimited attacker can learn from each input by observing the output. For the mitigated channels we also provide 𝓜0, which is the 95% upper confidence interval on the capacity of the apparent channel resulting from purely random sampling error. If 𝓜>𝓜0 we say that the observations are inconsistent with MI being zero and hence there is a leak, else the dataset contains no evidence of a leak. With each channel, n is the number of measurements made for each input symbol.

We use the leakiEst, a tool for estimating information leakage, for calculating the MI 𝓜 and the 𝓜0.

We show all channel matrices and capacities here, followed by a summary of the results.

L1 D-cache channel

This is the timing channel created by domains competing for lines in the data cache. We use a prime-and-probe attack from the Mastik toolkit, where spy and Trojan read data from a buffer large enough to cover the cache.

L1 D-cache Channel Matrix
Platform No migitation With mitigation
Haswell
(Intel i7-4770)
64bit
n=15436, 𝓜=4.0 b
n=15444, 𝓜=0.0005 b, 𝓜0=0.0005 b
Skylake
(Intel i7-6700)
64bit
n=15366, 𝓜=4.7 b
n=19251, 𝓜=0.0008 b, 𝓜0=0.0009 b
Sabre
(ARMv7 Cortex A9)
32bit
n=3808, 𝓜=2.0 b
n=3810, 𝓜=0.001 b, 𝓜0=0.001 b
Hikey
(ARMv8 Cortex A53)
32bit
n=7728, 𝓜=1.5 b
n=7693, 𝓜=0.0008 b, 𝓜0=0.0008 b

L1 I-cache channel

This is the timing channel created by domains competing for lines in the instruction cache. We use a prime-and-probe attack from the Mastik toolkit, where spy and Trojan execute a chain of jumps from a buffer large enough to cover the cache.

L1 I-cache Channel Matrix
Platform No migitation With mitigation
Haswell
(Intel i7-4770)
64bit
haswell 64bit
n=15532, 𝓜=0.3 b
n=haswell
n=15482, 𝓜=0.0007 b, 𝓜0=0.0008 b
Skylake
(Intel i7-6700)
64bit
skylake
n=15438, 𝓜=2.1 b
n=skylake
n=15406, 𝓜=0.0006 b, 𝓜0=0.0006 b
Sabre
(ARMv7 Cortex A9)
32bit
n=3808, 𝓜=2.5 b
n=3824, 𝓜=0.0013 b, 𝓜0=0.0013 b
Hikey
(ARMv8 Cortex A53)
32bit
n=3822, 𝓜=4.1 b
n=3797, 𝓜=0.019 b, 𝓜0=0.016b

TLB channel

 

This is the timing channel created by domains competing for entries in the tagged TLB. We use a prime-and-probe attack, where spy and Trojan read from a set of pages large enough to use half the available TLB entries (we use half the TLB to avoid self-interference).

TLB Channel Matrix
Platform No migitation With mitigation
Haswell
(Intel i7-4770)
64bit
haswell 64bit
n=15511, 𝓜=2.3 b
n=haswell
n=15529, 𝓜=0.0005 b, 𝓜0=0.0005 b
Skylake
(Intel i7-6700)
64bit
skylake
n=15444, 𝓜=2.1 b
n=skylake
n=15494, 𝓜=0.0008 b, 𝓜0=0.0008 b
Sabre
(ARMv7 Cortex A9)
32bit
n=7680, 𝓜=0.6 b
n=7617, 𝓜=0.0005 b, 𝓜0=0.0005 b
Hikey
(ARMv8 Cortex A53)
32bit
n=1589, 𝓜=0.5 b
n=1857, 𝓜=0.0023 b, 𝓜0=0.0024 b

BTB channel

This is the timing channel created by domains competing for entries in the buffer of branch targets contained in the branch-predictor unit. We use a prime-and-probe attack, where spy and Trojan execute a set of chained branch instructions, sufficient to fill the BTB.

BTB Channel Matrix
Platform No migitation With mitigation
Haswell
(Intel i7-4470)
64bit
haswell 64bit
n=7663, 𝓜=1.5 b
n=haswell
n=7748, 𝓜=0.0008 b, 𝓜0=0.0008 b
Skylake
(Intel i7-6700)
64bit
skylake
n=7727, 𝓜=0.05 b
n=skylake
n=7747, 𝓜=0.0014 b, 𝓜0=0.0015 b
Sabre
(ARMv7 Cortex A9)
32bit
n=1857, 𝓜=0.0075 b
n=1859, 𝓜=0.0041 b, 𝓜0=0.0044 b
Hikey
(ARMv8 Cortex A53)
32bit
n=3771, 𝓜=0.5 b
n=3774, 𝓜=0.003 b, 𝓜0=0.0014 b

BHB channel

This is the timing channel created by the branch predictor remembering whether a particular branch was taken (this is inherently a single-bit channel). The attack includes a sequence of conditional branches that are always taken, to set the history to a known state. This is followed by a branch that conditionally skips over 256 nop instructions. The Trojan encodes a single bit by training the prediction for the last branch. The spy measures the cost of executing the nop instructions, a mis-predict increases latency.

BHB Channel Matrix
Platform No migitation With mitigation
Haswell
(Intel i7-4770)
64bit
haswell 64bit
n=511765, 𝓜=1.0 b
n=haswell
n=511852, 𝓜=0.0005 b, 𝓜0=0.0 b
Skylake
(Intel i7-6700)
64bit
skylake
n=511487, 𝓜=1.0 b
n=skylake
n=511959, 𝓜=0.0003 b, 𝓜0=0.0003 b
Sabre
(ARMv7 Cortex A9)
32bit
n=511434, 𝓜=1.0 b
n=511736, 𝓜=0.0 b, 𝓜0=0.0005 b
Hikey
(ARMv8 Cortex A53)
32bit
n=511241, 𝓜=0.9 b
n=511668, 𝓜=0.04 b, 𝓜0=0.0 b

Mitigated timing channel without Intel's Spectre Fix (IBC)

As IBC performs a degree of sanitising of the branch predictor, the timing channels could still be observable without the IBC enabled. Thus, we re-run the instruction-related attacks on the Intel platforms to find out whether the channels remain without using the IBC.

Channel Matrix without Intel's IBC
Attack Haswell Skylake
L1 I-cache
sandy no
n=15480, 𝓜=0.02 b, 𝓜0=0.0006 b
sandy no
n=15472, 𝓜=0.01 b, 𝓜0=0.0008 b
BTB
sandy no
n=7733, 𝓜=0.3 b, 𝓜0=0.0013 b
sandy no
n=7713, 𝓜=0.0021 b, 𝓜0=0.0017
BHB
sandy no
n=511372, 𝓜=0.6 b, 𝓜0=0.0 b
sandy no
n=511219, 𝓜=0.02 b, 𝓜0=0.02

Summary of Results

Channel capacities 𝓜 in bits: unmitigated, maximally mitigated (zero-channel bound 𝓜0). Bold numbers are without IBC. Numbers in red indicate leakage.
Channel Haswell Skylake Sabre Hikey
L1-D 4.0 0.0005 (0.0005) 4.7 0.0008 (0.0009) 2.0 0.001 (0.001) 1.5 0.0008 (0.0008)
L1-I 0.3 0.0007 (0.0008) 2.1 0.0006 (0.0006) 2.5 0.0013 (0.0013) 4.1 0.019 (0.016)
0.02 (0.0006) 0.01 (0.0008)
TLB 2.3 0.0005 (0.0005) 2.1 0.0008 (0.0008) 0.6 0.0005 (0.0005) 0.5 0.0023 (0.0024)
BTB 1.5 0.0008 (0.0008) 0.05 0.0014 (0.0015) 0.0075 0.0041 (0.0044) 0.5 0.003 (0.0014)
0.3 (0.0013) 0.0021 (0.0017)
BHB 1.0 0.0005 (0.0) 1.0 0.0003 (0.0003) 1.0 0.0 (0.0005) 1.0 0.04 (0.0)
0.6 (0.0) 0.02 (0.02)

We can see from those results that, thanks to the recent addition of the IBC mechanism on Intel platforms, it is possible to close all channels on those processors, while before the availability of IBC there were significant uncloseable channels. But note that our scrubbing depends on flushing the I- and D-caches. On Intel platforms, this is presently only supported on a flush of the full cache hierarchy, which is prohibitively expensive. To make these defences practical, Intel would have to provide a targeted L1-cache flush operation.

On ARM we see that all channels can be closed on the older Sabre platform, which uses an in-order A9 core, while the newer, out-of-order A53-based Hikey has a number of uncloseable channels.

Back

Project home page