# Mapped Separation Logic Rafal Kolanski and Gerwin Klein ${\bf Sydney~Research~Lab.,~NICTA^{\star},~Australia}$ School of Computer Science and Engineering, UNSW, Sydney, Australia {rafal.kolanski|gerwin.klein}@nicta.com.au **Abstract.** We present Mapped Separation Logic, an instance of Separation Logic for reasoning about virtual memory. Our logic is formalised in the Isabelle/HOL theorem and it allows reasoning on properties about page tables, direct physical memory access, virtual memory access, and shared memory. Mapped Separation Logic fully supports all rules of abstract Separation Logic, including the frame rule. ### 1 Introduction Let memory be a function from addresses to values. Many a paper and semantics lecture start off with this statement. Unfortunately, it is not how memory on any reasonably complex machine behaves. Memory, on modern hardware, is virtualised and addresses pass through a translation layer before physical storage is accessed. Physical storage might be shared between different virtual addresses. Additionally, the encoding of how the translation works (the page table) resides in physical storage as well, and might be accessible through the translation. These two properties make the function model incorrect. Operating systems can be made to provide the illusion of plain functional memory to applications. This illusion, however, is brittle. Mapping and sharing virtual pages are standard Unix system calls available to applications and they can easily destroy the functional view. What is worse, on the systems software layer, e.g. for the operating system itself, device drivers, or system services, the virtual memory subsystem is fully visible: sharing is used frequently and the translation layer is managed explicitly. Building on earlier work [10], we show in this paper how Separation Logic [14], a tool for reasoning conveniently about memory and aliasing, can be extended to virtual memory. We do this in a way that preserves the abstract reasoning of Separation Logic as well as its critical locality feature: the frame rule. The technical contributions in this work are the following: a Separation Logic allowing convenient, abstract reasoning about both the virtual and physical layers of memory, supporting the frame rule for arbitrary writes to memory (including the page table), <sup>\*</sup> NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the Australian Research Council. - a separating conjunction extending to virtually shared memory, - a framework that makes the core logic independent of particular page table implementations, and - a case study on page allocation that demonstrates the logic. All of the work presented in this paper is formalised in the Isabelle/HOL theorem prover [11]. For presentation in this paper, we analyse the logic based on a simple, imperative programming language with arbitrary pointer arithmetic. It is our intention to connect this logic with earlier work on a precise memory model of C by Tuch et al. [16]. We currently do not take into account memory permissions beyond whether a virtual address is mapped/unmapped. Such permissions are merely extra properties of virtual addresses; they can be easily integrated into our logic. ## 2 Intuition Separation Logic traditionally models memory as a partial function from addresses to values, called the heap. The two central tools of Separation Logic are the separating conjunction and the frame rule. Separating conjunction $P \wedge^* Q$ is an assertion on heaps, stating that the heap can be split into two separate parts on which the conjuncts P and Q hold respectively. It expresses anti-aliasing conditions conveniently. The frame rule allows us to conclude $\{P \wedge^* R\}$ $f\{Q \wedge^* R\}$ from $\{P\}$ $f\{Q\}$ for any R. This expresses that the actions of f are local to the heaps described by P and Q, and can therefore not affect any separate heaps described by R. As we shall see below, with virtual memory, both of these tools break and both can be repaired. Virtual memory is a hardware-enforced abstraction allowing each executing process its own view of memory, into which areas of physical memory can be dynamically inserted. It adds a level of indirection: virtual addresses potentially map to physical addresses. Memory access is ordinarily done through virtual addresses only, although hardware devices may modify physical memory directly. This mapping layer is encoded in the heap itself, in a page table data structure. There are many flavours of such encodings, using multiple levels, different sizes of basic blocks (e.g. super pages), dynamically different table depths, and even hash tables. Usually, the encoding and minimum granularity (page size) is dictated by the hardware. The anchor of the page table in physical memory is called its root. Knowing the specific encoding, the full set of virtual-to-physical mappings for a process can be constructed given the heap and the root. Given the additional layer of indirection, two different virtual addresses may resolve to the same physical address, and separating conjunction breaks, because on these addresses it does not end up providing separation. Additionally, although a memory update to the page table may only locally change one byte of physical memory, it might completely change the view the virtual memory layer provides, affecting a whole number of seemingly unrelated virtual addresses. A local action might therefore have non-local effects. This breaks the frame rule. We show how to repair both tools for virtual memory in the remainder of this paper. Fig. 1. Maps-to assertions on the heap, virtual map and address space. We will henceforth restrict our use of the word heap to refer to physical memory, and we will call the virtual-to-physical translation the virtual map. This virtual map provides the first abstraction layer in our model of virtual memory: for the development of Separation Logic on virtual memory, we do not need to know how the encoding works precisely. We only need to know that there exists a way of decoding, and, as we shall see later, that there is a way of finding out which page table entries contribute to looking up a particular virtual address. As the rest of the development is independent of the encoding, we use only a very simple implementation for the case study presented here: a one-level page table. For reasoning about virtual memory we wish make statements on three levels, shown in Fig. 1: physical-to-values, virtual-to-physical, and virtual-tovalues. Given our two partial functions (heap and vmap), this may appear straight forward. The central question that arises is: which part of the heap should a virtual-to-physical lookup occupy? This question determines the notion of separation from lookups to other parts of the heap. Assuming for the example a one-level page table where byte x is responsible for encoding the lookup of virtual address $vp_1$ to physical address $p_2$ , there are two cases: (a) the lookup consumes x, or (b) it does not. In case (a) we cannot use separating conjunction any more to specify separation of an address $vp_2$ that happens to use the same page table entry x as there may be many virtual addresses sharing an entry in the encoding. This is a common case, and we clearly do not want to exclude it. The other extreme would be to say the lookup consumes no resources. In earlier work [10] we came to the result that this model cannot support the frame rule for arbitrary programs: a physical write to the page table would be separate to any virtual-to-physical lookup, but the write might have the non-local effect of changing that apparently separate mapping. In this paper, we are therefore proposing a solution in between: a page table lookup consumes *parts* of all the entries in the page table that are involved in the lookup. This can be seen as slicing heap entries up into sub-byte addressing. If a page table lookup, i.e. a virtual-to-physical mapping, consumes parts of the page table entries involved in the lookup, we can make all memory updates local. We only need to demand that the physical address being changed is fully owned, such that no virtual-to-physical mappings that involve the updated location are separate under separating conjunction. This idea is similar to the model of permissions by Bornat et al. [2] for concurrent threads. The difference is that in our case we extend the domain of the heap to slices, whereas in the permission model, the range is extended. This has the advantage that the development of heap disjointness, merging, separating conjunction and the other standard operators remains unchanged, and therefore should be easy to instantiate to our target model of C [16]. The memory footprint of page table lookups has a direct formulation in our model. The next section will briefly summarise our formal abstract interface to page table encodings, before Sect. 4 introduces the model and shallow Isabelle/HOL embedding of the assertion language for virtual memory. ## 3 The Virtual Memory Environment In this section, we describe the page table abstraction our logic is based on as well as an instantiation to a simple, one-level page table. As in previous work [10], we write VPtr vp and PPtr p to distinguish virtual and physical pointers by type in Isabelle/HOL, and provide a destructor ptr-val (PPtr x) = x when we need to talk about the physical/virtual address directly. The abstract concepts of heap, virtual map, and address space are in Isabelle: ``` \begin{array}{ll} \textbf{types} & heap = pptr \rightharpoonup val \\ & vmap = vptr \rightharpoonup pptr \end{array} \quad addr\text{-}space = vptr \rightharpoonup val ``` These types do not include slicing yet. The type of pointers is a parameter to the development. To keep the presentation simple, we use 32-bit machine words for pointers as well as values of memory cells in examples. For the development of the logic in the remainder of the paper, we require the following two functions. ``` ptable-lift :: heap \Rightarrow pptr \Rightarrow vmap ptable-trace :: heap \Rightarrow pptr \Rightarrow vptr \Rightarrow pptr set ``` Both functions take the physical heap and page table root as parameters. The result of ptable-lift is the vmap encoded by the page table, whereas ptable-trace returns for each virtual address vp the set of locations in the page table that are involved in the lookup of vp. In contrast to our earlier work, we do not require an explicit formulation of the page table area in memory. We will later require five constraints on these two functions. We defer the presentation of these to Sect. 5 when the associated concepts have been introduced. As mentioned above, we use a simple one-level page table as an example instantiation. It is a contiguous physical memory structure consisting of an array of machine word pointers, where word 0 defines the physical location of page 0 in the address space, word 1 that of page 1 and so forth. While inefficient in terms of storage, its simplicity and contiguity allows for fast experimentation with particular memory layouts. The table is based on an arbitrarily chosen page size of 4096, i.e. 20 bits for the page number and 12 for the offset. Page table lookup works as expected: we extract the page number from the virtual address, go to that offset in the page table and obtain a physical frame number which replaces the top 20 bits of the address: ``` \begin{array}{lll} \mathsf{get}\text{-}\mathsf{page}\ vp & \equiv \ \mathsf{ptr}\text{-}\mathsf{val}\ vp >> 12 \\ \mathsf{ptr}\text{-}\mathsf{remap}\ (\mathsf{VPtr}\ vp)\ pg & \equiv \ \mathsf{PPtr}\ (pg\ \mathsf{AND}\ \mathsf{NOT}\ \partial x \partial FFF\ \mathsf{OR}\ vp\ \mathsf{AND}\ \partial x \partial FFF) \\ \mathsf{ptable}\text{-}\mathsf{lift}\ h\ r\ vp & \equiv \ \mathsf{case}\ h\ (r\ +\ \mathsf{get}\text{-}\mathsf{page}\ vp)\ \mathsf{of}\ \mathsf{None} \Rightarrow \mathsf{None} \\ & |\ \lfloor addr\ \rfloor \Rightarrow \\ \mathsf{if}\ addr\ !!\ 0\ \mathsf{then}\ \lfloor \mathsf{ptr}\text{-}\mathsf{remap}\ vp\ addr\ \rfloor\ \mathsf{else}\ \mathsf{None} \\ \mathsf{ptable}\text{-}\mathsf{trace}\ h\ r\ vp & \equiv \ \mathsf{case}\ h\ (r\ +\ \mathsf{get}\text{-}\mathsf{page}\ vp)\ \mathsf{of}\ \mathsf{None} \Rightarrow \emptyset \\ & |\ |\ addr\ | \Rightarrow \{r\ +\ \mathsf{get}\text{-}\mathsf{page}\ vp\} \end{array} ``` AND, OR and NOT are bitwise operations on words. The operator >> is bitwise right-shift on words. The term $x \, !! \, n$ stands for bit n in word x. We use bit 0 to denote whether a mapping is valid. The optional value type has two constructors: None for no value and |value| otherwise. # 4 Separation Logic Assertions on Virtual Memory In this section, we describe how the classical Separation Logic assertions can be extended to hold for a model with virtual memory. As mentioned in Sect. 2, the idea is to extend the domain of the heap to slices. The smallest useful granularity of heap cells is that of virtual addresses. One page table entry can at most be responsible for all virtual addresses in the machine. Thus, the domain of the heap becomes $pptr \times vptr$ and (p, vp) stands for the vp-slice of physical address p. In our shallow embedding, Separation Logic assertions are predicates on this new heap and the root of the page table: ``` types fheap = (pptr \times vptr) \rightarrow val map\text{-}assert = (fheap \times pptr) \Rightarrow bool ``` With this model, the basic definition of heap merge (++), domain, and disjointness $(\perp)$ remain completely standard: ``` \begin{array}{lll} h_1 \ ++ \ h_2 & \equiv \ \lambda x. \ \mathsf{case} \ h_2 \ x \ \mathsf{of} \ \mathsf{None} \Rightarrow h_1 \ x \ | \ \lfloor y \rfloor \Rightarrow \lfloor y \rfloor \\ \mathsf{dom} \ m & \equiv \ \{ a \ | \ m \ a \neq \mathsf{None} \} \\ h_1 \ \bot \ h_2 & \equiv \ \mathsf{dom} \ h_1 \cap \mathsf{dom} \ h_2 = \emptyset \end{array} ``` The Separation Logic connectives for empty heap, true, conjunction, and implication stay almost unchanged. We additionally supply the page table root. Since the definitions are almost unchanged it is unsurprising that the usual properties such as commutativity and associativity and distribution over lifted normal conjunction continue to hold. The interesting, new assertions are the three *maps-to* statements that we alluded to in Fig. 1. Corresponding to the traditional Separation Logic predicate is the physical-to-value mapping: $$p \mapsto v \equiv \lambda(h, r). \ (\forall vp. \ h \ (p, vp) = |v|) \land \mathsf{dom} \ h = \{p\} \times \mathcal{U}$$ For this assertion, we require that all slices of the heap at physical address p map to the value v and that the domain of the heap is exactly the set of all pairs with p as the first component ( $\mathcal{U}$ is the universal set). This predicate would typically be used for direct memory access in devices or for low-level kernel operations. The next assertion is the virtual-to-physical mapping: ``` vp \mapsto_v p \equiv \lambda(h, r). let heap = \text{h-view } h \ vp; \ vmap = \text{ptable-lift } heap \ r in vmap \ vp = |p| \land \text{dom } h = \text{ptable-trace } heap \ r \ vp \times \{vp\} ``` where h-view $fh\ vp \equiv \lambda p$ . $fh\ (p,\ vp)$ . Here, we first lift the page table out of the heap to get the abstract vmap which provides us with the translation from vp to p. Additionally, we assert that the domain of the heap is the vp slice of all page table entries that are involved in the lookup. With h-view we project out the vp slice for all addresses so that the page table lift function can work on a traditional $pptr \rightharpoonup val$ heap. Putting the two together, we arrive at the virtual-to-value mapping that corresponds to the level that most of the OS and user code will be reasoning at. ``` vp \mapsto v \equiv |\exists |p. vp \mapsto_v p \wedge^* p \mapsto v ``` where $\lfloor \exists \rfloor x$ . $P x \equiv \lambda s$ . $\exists x$ . P x s. The predicate implies that the lookup path is separate from the physical address p the value v is at. This is the case for all situations we have encountered so far and, as the case study shows, works for page table manipulations as well. It is possible to define a weaker predicate without the separation, but this creates a special case for the assignment rule: if a write changes the page table for the address the write goes to, the post condition would have to take the change in the translation layer into account directly. The usual variations on the maps-to predicate can be defined in the standard manner again. We let the arrow notation apply both to the physical-to-value and the virtual-to-value mapping and rely on context and type inference picking the correct underlying definition. ``` \begin{array}{l} p \mapsto - \equiv \lfloor \exists \rfloor v. \ p \mapsto v \\ p \hookrightarrow v \equiv p \mapsto v \wedge^* \top \\ S \mapsto - \equiv \mathsf{fold} \ \mathsf{op} \wedge^* \ (\lambda x. \ x \mapsto -) \ \Box \ S \end{array} ``` The latter definition refers to a finite set S of addresses. It states that all of the elements hold separately. We have proved that both x-to-value mappings are intuitionistic [14]. Note that, although the virtual-to-value mapping is not domain exact [14] due to the existential quantifier on p, the mapping is still precise [12]: ``` precise P \equiv \forall \ Q \ R. \ P \wedge^* \ Q \ | \wedge | \ R = (P \wedge^* \ Q) \ | \wedge | \ (P \wedge^* \ R) ``` That is, it distributes over conjunction in both directions. We write $P [\land] Q$ for the lifted conjunction of assertions P and Q. ## 5 The Logic This section introduces a simple, heap based programming language with pointer arithmetic to analyse how the assertions presented above can be developed into ``` datatype com = datatype a exp = SKIP HeapLookup aexp aexp := aexp BinOp\ (val \Rightarrow val \Rightarrow val)\ aexp\ aexp com; com UnOp (val \Rightarrow val) aexp IF bexp THEN com ELSE com Const val WHILE bexp DO com datatype bexp = BConst bool \mathsf{BComp}\ (val \Rightarrow val \Rightarrow bool)\ aexp\ aexp BBinOp\ (bool \Rightarrow bool \Rightarrow bool)\ bexp\ bexp BNot bexp ``` Fig. 2. Syntax of the heap based WHILE language. Fig. 3. Semantics of arithmetic and boolean expressions. a full Separation Logic. For the meta-level proofs in this paper, we provide a deep embedding of the language into Isabelle/HOL. The language is standard, with skip, if, while, and assignment. The WHILE and IF statements potentially read from virtual memory in their guards. Assignment is the most interesting: it accesses memory through the virtual memory layer and can potentially modify this translation by writing to the page table. We leave out the simpler physical write access for the presentation here. It would be easy to add and does not increase the complexity of the language. Fig. 2 shows the Isabelle datatypes that make up the syntax of the language. Note that the left hand side of assignments can be an arbitrary arithmetic expression. For simplicity, we identify values and pointers in this language and admit arbitrary HOL functions for comparison, binary, and unary arithmetic expressions. The program states are the same states the separation assertions work on, but to the user, only the usual view of (translated) memory is available. The semantics of boolean and arithmetic expressions is shown in Fig. 3. We write $[\![B]\!]_b$ for the meaning of boolean expression B as a partial function from program state to *bool*. Analogously $[\![A]\!]$ is a partial function from state to values. All of these are straightforward, only heap lookup deserves more attention. Heap lookup only succeeds if the address the argument of the lookup evaluates to virtually maps to a value. This means that the appropriate slices that are involved in the page table lookup as well as the full cell of the target in physical memory must be available in the domain of the heap. In any execution, there will always be full memory cells available, but using our assertions from above we are able to make finer-grained distinction during program proofs. The function as-view (fh, r) $vp = (\text{let } hp = \text{h-view } fh \ vp \text{ in ptable-lift } hp \ r \rhd hp) \ vp \text{ uses h-view}$ as defined above to read the value from the heap after the address has been translated, and $f \rhd g \equiv \lambda x$ . case f x of None $\Rightarrow$ None $| \lfloor y \rfloor \Rightarrow g y$ to compose the two partial functions. Finally, Fig. 4 shows a big-step operational semantics of commands in the language. We write $\langle c,s\rangle \longrightarrow s'$ for command c, started in s, evaluates to s'. As usual, the semantics is the smallest relation satisfying the rules of Fig. 4. Non-termination is modelled by absence of the transition from the relation. In contrast to this, we model memory access failure explicitly by writing $\langle c,s\rangle \longrightarrow \mathsf{None}$ , and we do not allow accessing unmapped pages. On hardware, this would lead to a page fault and the execution of a page fault handler. This can also be modelled by making the assignment statement a conditional jump to the page fault handler, or with an abstraction layer showing that the page fault handler always establishes a known good state by mapping pages in from disk. An example of the latter has recently been shown by Alkassar et al. [1] for a kernel page fault handler, albeit not on the Separation Logic level. For the verification of the seL4 microkernel [5,6] (the larger context of this work), we intend to show absence of page faults in the kernel itself, hence our stricter model. Excluding the conditional jump is no loss of generality as we already have IF statements in the language. As with arithmetic expressions, the most interesting rule in Fig. 4 is assignment which involves heap access and which we will explain below. In the other rules, we abbreviate $[\![b]\!]_b$ $s = \lfloor \mathsf{True} \rfloor$ with $\langle b \rangle s$ , and $[\![b]\!]_b$ $s = \lfloor \mathsf{False} \rfloor$ with $\neg \langle b \rangle s$ . Failure in any part of the execution leads to failure of the whole statement. The assignment rule in the top right corner of Fig. 4 requires that both the arithmetic expressions for the left and right hand side evaluate without failure. The left hand side is taken as a virtual pointer, the right hand side as the value being assigned. The assignment succeeds if the virtual address vp is mapped and allocated. We use the notation s $[vp, - \mapsto v]$ to update all slices in the heap belonging to the physical address that vp resolves to with the new value v. Since heap writes always update full cells, all heaps in executions will only ever consist of full cells and are thus consistent with the usual, non-sliced view of memory. Slices are a tool for assertions and proofs only. Having shown the semantics, we can now proceed to defining Hoare triples. Validity is the usual: $$\{P\}\ c\ \{Q\} \equiv \forall s\ s'.\ \langle c,s \rangle \longrightarrow s' \land P\ s \longrightarrow (\exists r.\ s' = |r| \land Q\ r)$$ We do not define a separate syntactic Hoare calculus. Instead, we define validity only and then derive the Hoare rules as theorems in Isabelle/HOL directly. Fig. 5 shows the rules we have proved for this language. Again, the rules for IF, WHILE, etc., are straight forward and the same as in a standard Hoare $$\langle \mathsf{SKIP}, s \rangle \longrightarrow \lfloor s \rfloor \qquad \frac{\llbracket lval \rrbracket \ s = \lfloor v \rfloor \qquad [\llbracket val \rrbracket \ s = \lfloor v \rfloor \qquad (\mathsf{VPtr} \ vp \leftarrow -) \ s}{\langle lval := rval \ , s \rangle} \longrightarrow \lfloor s \ [\mathsf{VPtr} \ vp, - \mapsto_v \ v] \rfloor$$ $$\frac{\llbracket lval \rrbracket \ s = \mathsf{None} \ \vee \llbracket rval \rrbracket \ s = \mathsf{None}}{\langle lval := rval \ , s \rangle} \longrightarrow \mathsf{None} \qquad \frac{\llbracket lval \rrbracket \ s = \lfloor vp \rfloor \qquad (\mathsf{VPtr} \ vp \leftarrow -) \ s}{\langle lval := rval \ , s \rangle} \longrightarrow \mathsf{None}$$ $$\frac{\langle c_0, s \rangle \longrightarrow \lfloor s'' \rfloor \qquad \langle c_1, s'' \rangle \longrightarrow s'}{\langle c_0 \ ; \ c_1, s \rangle} \longrightarrow \frac{\langle c_0, s \rangle \longrightarrow \mathsf{None}}{\langle c_0 \ ; \ c_1, s \rangle} \longrightarrow \mathsf{None}$$ $$\frac{\langle b \rangle \ s \qquad \langle c_0, s \rangle \longrightarrow s'}{\langle \mathsf{IF} \ b \ \mathsf{THEN} \ c_0 \ \mathsf{ELSE} \ c_1, s \rangle} \longrightarrow s'$$ $$\frac{\lceil b \rrbracket_b \ s = \mathsf{None}}{\langle \mathsf{IF} \ b \ \mathsf{THEN} \ c_0 \ \mathsf{ELSE} \ c_1, s \rangle} \longrightarrow \mathsf{None}$$ $$\frac{\neg \langle b \rangle \ s}{\langle \mathsf{WHILE} \ b \ \mathsf{DO} \ c, s \rangle} \longrightarrow \lfloor s \rfloor$$ $$\frac{\langle b \rangle \ s \qquad \langle c, s \rangle \longrightarrow \lfloor s'' \rfloor}{\langle \mathsf{WHILE} \ b \ \mathsf{DO} \ c, s \rangle} \longrightarrow \mathsf{S'}$$ $$\frac{\langle b \rangle \ s \qquad \langle c, s \rangle \longrightarrow \mathsf{None}}{\langle \mathsf{WHILE} \ b \ \mathsf{DO} \ c, s \rangle} \longrightarrow \mathsf{None}$$ $$\frac{\langle b \rangle \ s \qquad \langle c, s \rangle \longrightarrow \mathsf{None}}{\langle \mathsf{WHILE} \ b \ \mathsf{DO} \ c, s \rangle} \longrightarrow \mathsf{None}$$ $$\frac{\langle b \rangle \ s \qquad \langle c, s \rangle \longrightarrow \mathsf{None}}{\langle \mathsf{WHILE} \ b \ \mathsf{DO} \ c, s \rangle} \longrightarrow \mathsf{None}$$ $$\frac{\langle b \rangle \ s \qquad \langle c, s \rangle \longrightarrow \mathsf{None}}{\langle \mathsf{WHILE} \ b \ \mathsf{DO} \ c, s \rangle} \longrightarrow \mathsf{None}$$ $$\frac{\langle b \rangle \ s \qquad \langle c, s \rangle \longrightarrow \mathsf{None}}{\langle \mathsf{WHILE} \ b \ \mathsf{DO} \ c, s \rangle} \longrightarrow \mathsf{None}$$ $$\frac{\langle b \rangle \ s \qquad \langle c, s \rangle \longrightarrow \mathsf{None}}{\langle \mathsf{WHILE} \ b \ \mathsf{DO} \ c, s \rangle} \longrightarrow \mathsf{None}$$ $$\frac{\langle b \rangle \ s \qquad \langle c, s \rangle \longrightarrow \mathsf{None}}{\langle \mathsf{WHILE} \ b \ \mathsf{DO} \ c, s \rangle} \longrightarrow \mathsf{None}$$ $$\frac{\langle b \rangle \ s \qquad \langle c, s \rangle \longrightarrow \mathsf{None}}{\langle \mathsf{WHILE} \ b \ \mathsf{DO} \ c, s \rangle} \longrightarrow \mathsf{None}$$ $$\frac{\langle b \rangle \ s \qquad \langle c, s \rangle \longrightarrow \mathsf{None}}{\langle \mathsf{WHILE} \ b \ \mathsf{DO} \ c, s \rangle} \longrightarrow \mathsf{None}$$ $$\frac{\langle b \rangle \ s \qquad \langle c, s \rangle \longrightarrow \mathsf{None}}{\langle \mathsf{WHILE} \ b \ \mathsf{DO} \ c, s \rangle} \longrightarrow \mathsf{None}$$ $$\frac{\langle b \rangle \ s \qquad \langle c, s \rangle \longrightarrow \mathsf{None}}{\langle \mathsf{WHILE} \ b \ \mathsf{DO} \ c, s \rangle} \longrightarrow \mathsf{None}$$ $$\frac{\langle b \rangle \ s \qquad \langle c, s \rangle \longrightarrow \mathsf{None}}{\langle \mathsf{WHILE} \ b \ \mathsf{DO} \ c, s \rangle} \longrightarrow \mathsf{None}$$ $$\frac{\langle b \rangle \ s \qquad \langle c, s \rangle \longrightarrow \mathsf{None}}{\langle \mathsf{WHILE} \ b \ \mathsf{DO} \ c, s \rangle} \longrightarrow \mathsf{None}$$ $$\frac{\langle b \rangle \ s \qquad \langle c, s \rangle \longrightarrow \mathsf{None}}{\langle \mathsf{WHILE} \ b \ \mathsf{DO} \ c, s \rangle} \longrightarrow \mathsf{None}$$ $$\frac{\langle b \rangle \ s \qquad \langle c, s \rangle \longrightarrow \mathsf{None}}{\langle \mathsf{WHILE} \ b \ \mathsf{DO} \ c, s \rangle} \longrightarrow \mathsf{None}$$ $$\frac{\langle b \rangle \ s \qquad \langle c, s \rangle \longrightarrow \mathsf{None}}{\langle \mathsf{WHILE} \ b \ \mathsf{DO} \ c, s \rangle} \longrightarrow \mathsf{None}$$ $$\frac{\langle b \rangle \$$ Fig. 4. Big-step semantics of commands. calculus. We write $P [\longrightarrow] Q \equiv \forall s. P s \longrightarrow Q s$ for lifted implication, and $\langle b \rangle s$ to denote that $[\![b]\!]_b s \neq \mathsf{None}$ . The precondition P in the IF and WHILE case must be strong enough to guarantee failure free evaluation of the condition b. The lifting rules for conjunction and disjunction, as well as the weakening rule are easy to prove, requiring only the definition of validity and separating conjunction. The interesting cases are the assignment rule and the frame rule. The assignment rule is more complex looking than the standard rule of Separation Logic. Here, both the left and right hand side of the assignment are arbitrary expressions, potentially including heap lookups that need to be evaluated first. This is the reason for the additional P conjunct separate from the basic pointer mapping. It is not an artifact of virtual memory, but one of expression evaluation only. In essence, this is a rule schema. P can be picked to be just strong enough for the evaluation of the expressions to succeed. For instance, if the left hand side were to contain one heap lookup for location x only, we would chose P to be of the form $x \mapsto r$ . The rule is sound for too small and too large P: either the precondition is false and the triple trivially true, or the precondition is merely stronger than it needs to be. The proof of the assignment rule proceeds by unfolding the definitions and observing that the postcondition is established by the memory update, noting the fact that the ptable-trace from the precondition fully accounts for all page table entries that are relevant in the postcondition and that P is preserved, because it is Fig. 5. The proof rules for Mapped Separation Logic. separate from the heap in which the update occurs. Since the physical address of the write is separate from the page table lookup for this address and from P, the translation layer for their heaps is not affected by the write. To reason about page table updates we need a slightly stronger rule that unfolds the virtual-to-value mapping and lets us talk about the physical address p: Reasoning with the new mapping predicates is similar to abstract-predicate style reasoning [13] if we never unfold their definitions in client proofs. As we will see in the case study, we only need to do this locally if we are reasoning about changes to the page table and we are interested in the page that is being modified. This obviously heavily depends on the page table encoding. Application level reasoning can proceed fully abstractly. The second interesting proof rule is the frame rule that allows global reasoning based on local proofs. In many ways it can be seen as the core of Separation Logic. Calcagno et al. [3] provide an abstract framework for identifying a logic as Separation Logic and distill out the central property of locality. In their setting, our separation algebra is the common heap monoid where the binary operation is heap merge lifted to the $heap \times pptr$ type. Our definition of separating conjunction then coincides with the one in the framework, and to show that our logic is a Separation Logic, we only need to show that all actions in the programming language are local. Locality is equivalent to the combination of safety monotonicity and the frame property. In our setting, these two are: Fig. 6. The page table interface. **Lemma 1 (Safety Monotonicity).** If a small state provides enough resources to run a command c, then so does a larger state. If $\langle c, (h ++ h', r) \rangle \longrightarrow \mathsf{None}$ and $h \perp h'$ then $\langle c, (h, r) \rangle \longrightarrow \mathsf{None}$ . **Lemma 2 (Frame Monotonicity).** Execution of a command can be traced back to a smaller part of the state as long as the command executes at all on the smaller state. If $\neg \langle c,(h_0, r) \rangle \longrightarrow \text{None}$ and $\langle c,(h_0 + h_1, r) \rangle \longrightarrow \lfloor (h', r) \rfloor$ and $h_0 \perp h_1$ then $\exists h_0'$ . $h' = h_0' + h_1 \wedge \langle c,(h_0, r) \rangle \longrightarrow \lfloor (h_0', r) \rfloor$ . We have not formalised the full abstract, relational treatment of the framework by Calcagno et al., but shown the above two properties and the implied frame rule directly by induction on the evaluation of commands. Fig. 6 gives the page table interface constraints that we promised in Sect. 3. These rules need to be proved about ptable-lift and ptable-trace for a new page table instantiation in order to use the abstract logic presented before. The first two rules are the frame and monotonicity property on ptable-lift. The third rule states that if the domain of ptable-lift does not change, neither does ptable-trace. The last two rules state that updates to the heap outside the trace affect neither the lifting nor the trace of the page table. We have proved the rule in Fig. 6 for the one-level page table instantiation in the examples. #### 6 Case Study In this section, we present a small page allocation and assignment routine one might see in operating system services, mapping a *frame*, the physical equivalent of a page, to some address in virtual memory. The program appears in Fig. 7 with simplified syntax. Frame availability information is stored in a frame table, which contains one entry per frame in the system, marking it as used or unused. In our program, line 1 attempts to find a free frame's entry in the free frame list. An empty list causes an erroneous return at line 7. Line 3 removes the head of the list. Line 4 calculates the number of the frame from its entry. Upon successfully allocating a frame, line 5 updates the page table with the appropriate entry mapping page-addr to the new frame. We have proved that the program conforms to the following specification: ``` 1. fte := ft_free_list; 2. IF fte != NULL THEN 3. ft_free_list := *ft_free_list; 4. frame := &fte - &frame_table; 5. *(ptable + (a2p page_addr)) := f2a frame OR valid_pmask; 6. ret_val := 0 7. ELSE ret_val := -1 ``` Fig. 7. A simple page table manipulating program. The functions a2p and f2a convert addresses to page numbers and frame numbers to addresses by respectively dividing and multiplying by 4096. Since the language in this paper is purely heap based, we use $var_v \mapsto var$ to denote that a variable var has a specific value and to represent var in Fig. 7. We assume that the page table is accessible from ptable, and the frame table lies at frame-table. Further, we have a non-empty free list starting at first-free, where each address in the list indicates the presence of a frame. Finally, we require that the page table entry that is used to resolve a lookup of page-addr is allocated, and accessible from our address space. We additionally require that page-addr is aligned to the page size. As a result of executing the program, the free frame list becomes shorter and the page at page-addr is fully accessible. The latter is: ``` \begin{array}{ll} \mathsf{page\text{-}mapped} \ vp & \equiv \mathsf{entire\text{-}page} \ vp \mapsto -\ \wedge^* \ \mathsf{consume\text{-}slices} \ vp \ (\mathcal{U} - \mathsf{entire\text{-}page} \ vp) \\ \mathsf{consume\text{-}slice} \ vp \ slc & \equiv \lambda(h, \ r). \ \mathsf{dom} \ h = \mathsf{ptable\text{-}trace} \ (\mathsf{h\text{-}view} \ h \ slc) \ r \ vp \ \times \{slc\} \\ \mathsf{consume\text{-}slices} \ vp \ S & \equiv \mathsf{fold} \ \mathsf{op} \ \wedge^* \ (\mathsf{consume\text{-}slice} \ vp) \ \square \ S \\ \mathsf{entire\text{-}page} \ vp & \equiv \{vp..\mathsf{VPtr} \ (\mathsf{vptr\text{-}val} \ vp \ + \ \theta x\theta FFF)\} \end{array} ``` Our separation logic statements all work on heaps of a precise size. Using up the entire page table entry in our precondition means we must likewise use it all in the postcondition. A page table entry maps 4096 addresses, but is made up of more slices. Stating that those addresses are mapped does not consume all the slices. The difference in heap size is made up by consume-slices. The actual mapping-in step from line 5 of our program performs a write to the page table at pt + a2p page-addr. In addition to our assignment rule, we have shown another property that allows us to to conclude from the post-state of line 5 that the page at page-addr is now completely mapped: ``` entire-frame (PPtr (f2\ frame)) \to - \wedge^* pt-map page-addr frame \lfloor \longrightarrow \rfloor page-mapped page-addr pt-map page-addr frame \equiv \lambda(h,\ r). (PPtr (pptr-val r + a2p page-addr) \mapsto (f2a frame OR 1)) (h,\ r) ``` This rule is the only place in the case study where we had to unfold page table definitions and reason directly about the encoding. All other reasoning used Separation Logic rules only. In order to check successful interaction with the newly mapped page, we added an extra segment to our program. ``` *page_addr := 0xFF; *(page_addr + 3) := *page_addr + 2 ``` If executed on a state with offsets 0 and 3 in the page mapped and allocated: ``` \{page-addr \mapsto - \wedge^* (page-addr + 3) \mapsto -\} ``` it results in those offsets set to 0xFF and 0x101: ``` \{page-addr \mapsto \theta xFF \wedge^* (page-addr + 3) \mapsto \theta x1\theta 1\} ``` These two offsets are part of the page at page-addr; the program fragment does not change anything else. As we can rewrite ``` page-mapped p=p\mapsto -\wedge^*(p+3)\mapsto -\wedge^* page-mapped \{p,\,p+3\} p ``` we can invoke the frame rule and include the rest of the state. Our case study shows that our logic allows abstract reasoning, even in the presence of page table manipulation. It constitutes approximately 1500 lines of proof script, on top of a 4500 line framework. Many of those 1500 lines deal with setting up and dealing with a deep embedding. #### 7 Related work The primary focus of this work is enhancement of Separation Logic, originally conceived by O'Hearn, Reynolds et al. [8,14]. Separation logic has previously been formalised in mechanised theorem proving systems [17]. We enhance these models with the ability to reason about properties on virtual memory. Previous work in OS kernel verification has involved verifying the virtual memory subsystem [9,4,7]. Reasoning about programs running under virtual memory, however, especially the operating systems which control it, remains mostly unexplored. The challenges of reasoning about virtual memory are explored in the development of the Robin micro-hypervisor [15]. Like our work, the developers of Robin aim to use a single semantics to describe all forms of memory access which simplifies significantly in the well-behaved case. They focus on reasoning about "plain memory" in which no virtual aliasing occurs and split it into read-only and read-write regions, to permit reading the page table while in plain memory. They do not use Separation Logic notation. Our work is more abstract. We do not explicitly define "plain memory". Rather the concept emerges from the requirements and state. As mentioned previously, Alkassar et al. [1] have recently proved the correctness of a kernel page fault handler. As in our setting, they use a single level page table and prove that the page fault handler establishes the illusion to the user of a plain memory abstraction, swapping in pages from disk as required. The proof establishes simulation between those two abstraction layers, with the full complexity of the page table encoding visible at the lower level for the whole verification. The work presented in this paper focuses on a logic for reasoning about virtual memory instead. The aim is to make verifications such as the above easier and more productive. Tuch et al. demonstrated the extension of Separation Logic to reasoning about C programs involving pointer manipulation [16]. Presently, our work uses a simplified machine model with only one type. We believe our framework is orthogonal and can be instantiated readily to the C model. #### 8 Conclusion and Future Work We have presented an extension of Separation Logic which allows reasoning about virtual memory and processes running within. The logic fully supports the frame rule as well as the other Separation Logic proof rules and allows for a convenient representation of predicates on memory at three levels: the virtual map, the physical heap and the virtual address space. We have presented a small case study that demonstrates the applicability of the logic to OS level page table code as well as client code using the page table mechanism. For analysing and presenting the logic in this paper we chose a simplified machine and page table implementation. The logic does not depend on the implementation of either. Although our framework does not presently take access rights into account, it can be easily extended to encompass them. We have significantly extended our previous work on virtual memory and have managed to fully hide the complexity of virtual memory reasoning for code that does not directly modify the page table. We also have shown that reasonably high-level reasoning is still possible for code that does. The concepts in the logic are close to the mental model kernel programmers have of virtual memory. The next steps are to fully instantiate this model to the C programming language and apply it to the verification of the seL4 microkernel. Acknowledgements We thank Harvey Tuch, Simon Winwood, and Thomas Sewell for discussions and comments on earlier versions of this paper as well as Hongsoek Yang for sharing his insight into different formulations of Separation Logic. #### References 1. E. Alkassar, N. Schirmer, and A. Starostin. Formal pervasive verification of a paging mechanism. In *Proci4th Int'l Conference on Tools and Algorithms for the* - Construction and Analysis of Systems (TACAS'08), LNCS, Budapest, Hungary, Apr. 2008. Springer. to appear. - 2. R. Bornat, C. Calcagno, P. O'Hearn, and M. Parkinson. Permission accounting in separation logic. In *POPL '05: Proc32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages*, pages 259–270. ACM, 2005. - 3. C. Calcagno, P. W. O'Hearn, and H. Yang. Local action and abstract separation logic. In *LICS '07: Proceedings of the 22nd Annual IEEE Symposium on Logic in Computer Science*, pages 366–378. IEEE Computer Society, 2007. - I. Dalinger, M. A. Hillebrand, and W. J. Paul. On the verification of memory management mechanisms. In D. Borrione and W. J. Paul, editors, CHARME, volume 3725 of LNCS, pages 301–316. Springer, 2005. - P. Derrin, K. Elphinstone, G. Klein, D. Cock, and M. M. T. Chakravarty. Running the manual: An approach to high-assurance microkernel development. In *Proc.* ACM SIGPLAN Haskell WS, Portland, OR, USA, Sept. 2006. - K. Elphinstone, G. Klein, P. Derrin, T. Roscoe, and G. Heiser. Towards a practical, verified kernel. In *Proc. 11th Workshop on Hot Topics in Operat*ing Systems, page 6, San Diego, CA, USA, May 2007. Online proceedings at http://www.usenix.org/events/hotos07/tech/. - M. Hillebrand. Address Spaces and Virtual Memory: Specification, Implementation, and Correctness. PhD thesis, Saarland University, Saarbrücken, 2005. - S. S. Ishtiaq and P. W. O'Hearn. BI as an assertion language for mutable data structures. In POPL '01: Proceedings of the 28th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 14–26. ACM, 2001. - G. Klein and H. Tuch. Towards verified virtual memory in L4. In K. Slind, editor, TPHOLs Emerging Trends '04, Park City, Utah, USA, 2004. - R. Kolanski. A logic for virtual memory. In R. Huuck, G. Klein, and B. Schlich, editors, Proc. 3rd Int'l Workshop on Systems Software Verification (SSV'08), ENTCS, pages 55–70. Elsevier, Feb. 2008. to appear. - 11. T. Nipkow, L. Paulson, and M. Wenzel. *Isabelle/HOL A Proof Assistant for Higher-Order Logic*, volume 2283 of *LNCS*. Springer, 2002. - 12. P. W. O'Hearn, H. Yang, and J. C. Reynolds. Separation and information hiding. In *POPL '04: Proc31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages*, pages 268–280. ACM, 2004. - M. Parkinson and G. Bierman. Separation logic and abstraction. In POPL '05: Proc. 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 247–258. ACM, 2005. - 14. J. C. Reynolds. Separation logic: A logic for shared mutable data structures. In *Proc. 17th IEEE Symposium on Logic in Computer Science*, pages 55–74, 2002. - 15. H. Tews. Formal methods in the Robin project: Specification and verification of the Nova microhypervisor. In C/C++ Verification Workshop, Technical Report ICIS-R07015, pages 59–68, Oxford, UK, July 2007. Radboud University Nijmegen. http://pms.cs.ru.nl/iris-diglib/src/getContent.php?id=2007-Tews-cv. - H. Tuch, G. Klein, and M. Norrish. Types, bytes, and separation logic. In M. Hofmann and M. Felleisen, editors, POPL '07: Proc. 34th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 97–108. ACM, 2007. - T. Weber. Towards mechanized program verification with separation logic. In J. Marcinkowski and A. Tarlecki, editors, Computer Science Logic - 18th Int'l Workshop, CSL 2004, volume 3210 of LNCS, pages 250-264. Springer, 2004.