Trustworthy Systems

Our Partners Utrecht University

Cogent

Cogent is a purely functional language aimed at reducing the cost of verified systems code.

Context

Fundamental to our approach for developing trustworthy systems is to use a minimal trusted computing base (TCB) and formally prove its correctness. We build such systems on top of the seL4 microkernel, for which we have developed proofs of implementation correctness and enforcement of isolation properties, required for security and safety.

However, a real system's TCB is bigger than the microkernel. In particular, it will contain operating system services, such as file systems, device drivers, and network stacks. In order to achieve overall trustworthiness, these services must be verified. As manual verification is costly and scales poorly, we are investigating ways to (partially) automate the verification of system services. We are using file systems as a first case study.

Approach: Co-generation of code and proof

The core idea of Cogent is to specify a code module's functionality in a high-level language, and from that generate low-level (C) code, together with a proof of the correctness of the low-level code. This means the programmer can focus on the logic of the problem, i.e. the algorithms, without having to worry about error-prone tasks such as memory management or unwinding of state in the case of unsuccessful operations.

Importantly, the high-level language the provides a more convenient and productive environment for proving the correctness of the overall operation of the service. This is similar to the (manual) approach we took with seL4, where we did a two-step refinement proof: (1) from a high-level functional specification to a low-level executable spec, and (2) from the executable spec to C code; Cogent automates the second step.

We use file system implementations to drive the design and implementation of the Cogent framework. We can make a number of observations about file systems (at least some of which also hold for other systems code, such as network stacks):

  1. Modern operating systems have a layered architecture, where services are stacks of abstraction levels, file systems and network stacks are examples. Such a logical structure lends itself to a modular implementation, with components interfaces matching the abstraction levels. For verification, such modularity is a big advantage, as components with well-defined interfaces can be verified individually. Given our earlier findings that verification effort scales quadratically with code size, this is key to reducing overall verification cost;
  2. Due to the nature of FSs, their implementations tend not to have much sharing among data structures. This feature is well captured by a linear type system, which directly enables easy memory management, and as a consequence build a straight linkage between a pure high-level semantics and a stateful low-level semantics;
  3. The control logic is cluttered and obscured with error-handling code; in some modules, error-handling makes up 80% of all code. Much of this is boilerplate (which is nevertheless error-prone), and can be abstracted away into a domain-specific language (DSL). This lets the implementer focus on the high-level logic;
  4. In-memory data structures for maintaining the file system information can largely be handled by suitable abstract data types (ADTs), which can be implemented and verified separately;
  5. A large subset of FS code complexity can be easily described using DSLs, namely code for serialising and de-serialising between rich, in-memory data structures; flat, on-medium representations; and code implementing the actual file-system logic.

The below figure shows the main components of our framework. Dark grey boxes are the tools that we employ in this framework; light grey boxes represent code or specifications that are automatically generated; white boxes represent semantics; block arrows represent the proofs which connect them; green arrows are the parts that we have managed to fully automate; finally, dotted boxes indicate components which the FS implementer must provide. While the correctness specification, proofs, and implementation (dotted boxes and arrows in the figure) are specific to the FS being built, the ADT library is reusable across many different FSs.

file system verification methodology

As indicated in the diagram, the file system implementer must provide four distinct components:

The Cogent compiler takes Cogent source code as input and produces, from the bottom of the figure above: the C code, the semantics of the C code expressed in Isabelle/Simpl, the same expressed as a monadic functional program, a monomorphic deep embedding of the Cogent program in A normal form, a polymorphic A-normal deep embedding of the same, an A-normal shallow embedding, and finally a 'neat' shallow embedding of the Cogent program that is syntactically close to the Cogent source code. Several theorems rely on the Cogent program being well-typed, which we prove automatically using type inference information from the compiler. The solid arrows on the right-hand side of the figure represent refinement proofs. The only arrow that is not formally verified is the one crossing from C code into Isabelle/HOL at the bottom (the red arrow) — this is the C parser, which is a mature verification tool used in a number of large-scale verifications (e.g. seL4).

For well-typed Cogent programs, we prove a series of refinement relations. The refinement proofs state that every behaviour exhibited by the C code can also be exhibited by the Cogent code and, furthermore, that the C code is always well-defined, including that e.g. the generated C code never dereferences a null pointer, and never causes signed overflow. It also implies that the generated C code is type-safe and memory-safe, meaning the code will never try to dereference an invalid pointer or try to dereference two aliasing pointers of incompatible types. From the generated proofs, We additionally get guarantees that the generated code handles all error cases, is free of memory leaks, and never double-frees a pointer.

A key component to the overall refinement proof comes directly from Cogent’s linear type system. As linear types ensure that only one reference exists at a time to a heap object, a function that destructively updates an object may be given the type of a pure function, as it is impossible to distinguish destructive update from a purely functional update without aliasing. This allows us to assign two different semantic interpretations to the same code: A purely functional value semantics, which is is suitable for high level verification using equational reasoning, and an imperative update semantics, including pointers and a mutable heap, which more closely resembles the semantics of the generated C. We prove once and for all that, for any well-typed program, the update semantics is a refinement of the value semantics.

Research contributions

The project explores a new methodology for designing and formally verifying the correctness of systems code, based on a new language Cogent. We use file systems for case studies. Specific contributions include:

Status

The Cogent project, besides its initial goal of reducing the effort of developing formally verified systems software, has become a platform for a wider range of research activities.

File systems

To date we have implemented several file systems in Cogent:

All file system implementations can be deployed in Linux, they sit below Linux's virtual file system switch (VFS) module, using C stub code to provide the top-level entry points expected by the VFS. We are also using them as native components on seL4.

The Cogent implementations of the file systems share a common library of ADTs that includes fixed-length arrays for words and structures, simple iterators for implementing for-loops, and Cogent stubs for accessing a range of Linux APIs such as the buffer cache and its native red-black tree implementation, plus iterators for each ADT.

For BilbyFS we have connected the low-level proofs to manual proofs of the correctness of two high-level functions: sync() and iget(), to demonstrate the ease of verifying the high-level functionality. More complete verification will happen in the future.

Other systems components

We are currently actively looking into applying our experience acquired in developing file systems to other systems components.

Cogent language and compiler

Since our ICFP'16 paper, there have been a lot of improvements to the Cogent language design and to the compiler. Most notably:

Dargent

Dargent is a data layout description language extension to Cogent. Dargent allows users to customise the memory layouts of Cogent algebraic datatypes down to the bits and bytes. Therefore, Cogent types can be laid out in a more idiomatic manner, resulting in more efficient C code and better compliance with protocols, standards and interfaces, while retaining the abstraction that algebraic datatypes have. Dargent, albeit sharing a lot in common with data description languages (DSLs), it solves a very different problem and should not be confused with them. To learn more, please read our recently published Dargent paper (POPL 2023). In this paper, we introduce the Dargent language, show how we integrate Dargent into the existing certifying compilation framework, and present several case-studies using Dargent.

Property-based testing

Formally proving functional correctness properties of systems code is expensive, and is vulnerable to changes in the source programs. In contrast, property-based testing is lightweight and can be used as a temporary substitute for formal proofs, especially in the development process. The properties used in testing should match closely with the lemmas to be proved, thus the effort spent in testing will not be wasted when coming to formal verification.

Dependently typed Cogent

Dependent type systems are attractive for their expressiveness and proof-carrying nature. We are exploring adding a restricted form of dependent types to Cogent. The research challenge lies in the integration between dependent types and uniqueness types.

Cogent standard library

We are redesigning the Cogent standard library for accessing the Linux kernel. The new design has a higher-level abstraction over the Linux kernel interfaces, easing the specification and proof of systems code in Cogent.

Talks

An ICFP'16 talk given by Liam O'Connor can be viewed here. The talk gives a high-level overview of the project and our work. If you are new to Cogent (or not, even), you are highly recommended to start your journey here.

Code

Code and documentation for this project can be found on Github.

Contributing

The Contributor License Agreement can be downloaded here. If you have larger changes or additions, it might be a good idea to get in contact with us, so we can help you get started.

People

Past

Publications

Abstract PDF Zilin Chen, Ambroise Lafont, Liam O'Connor, Gabriele Keller, Craig McLaughlin, Vincent Jackson and Christine Rizkallah
Dargent: A silver bullet for verified data layout refinement
Proc. ACM Program. Lang., Volume 7, Number POPL, January, 2023
Abstract PDF Zilin Chen
Towards a practical high-assurance systems programming language
PhD Thesis, UNSW, Sydney, Australia, 2023
Abstract PDF Zilin Chen, Christine Rizkallah, Liam O'Connor, Partha Susarla, Gerwin Klein, Gernot Heiser and Gabriele Keller
Property-based testing: Climbing the stairway to verification
ACM SIGPLAN International Conference on Software Language Engineering, Auckland, New Zealand, December, 2022
Distinguished Artifact Award
Abstract PDF Louis Cheung, Liam O'Connor and Christine Rizkallah
Overcoming restraint: Composing verification of foreign functions with Cogent
Proceedings of the 11th ACM SIGPLAN International Conference on Certified Programs and Proofs, pp. 13–26, New York, NY, USA, 2022
Abstract PDF Liam O'Connor, Zilin Chen, Christine Rizkallah, Vincent Jackson, Sidney Amani, Gerwin Klein, Toby Murray, Thomas Sewell and Gabriele Keller
Cogent: Uniqueness types and certifying compilation
Journal of Functional Programming, Volume 31, 2021
Abstract PDF Gernot Heiser, Gerwin Klein and June Andronick
seL4 in Australia: From research to real-world trustworthy systems
Communications of the ACM, Volume 63, Issue 4, pp. 72-75, April, 2020
Abstract PDF Liam O'Connor
Type systems for systems types
PhD Thesis, UNSW, Sydney, Australia, August, 2019
Abstract PDF Zilin Chen, Matt Di Meglio, Liam O'Connor, Partha Susarla Ajay, Christine Rizkallah and Gabriele Keller
A data layout description language for Cogent
at PriSC, Lisbon, Portugal, January, 2019
Abstract PDF Liam O'Connor, Zilin Chen, Partha Susarla Ajay, Christine Rizkallah, Gerwin Klein and Gabriele Keller
Bringing effortless refinement of data layouts to Cogent
International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, pp. 134-149, Limassol, Cyprus, November, 2018
Abstract PDF Zilin Chen, Liam O'Connor, Gabriele Keller, Gerwin Klein and Gernot Heiser
The Cogent case for property-based testing
Workshop on Programming Languages and Operating Systems (PLOS), pp. 1-7, Shanghai, China, October, 2017
Abstract PDF Gerwin Klein, June Andronick, Gabriele Keller, Daniel Matichuk, Toby Murray and Liam O'Connor
Provably trustworthy systems
Philosophical Transactions of the Royal Society A, Volume 375, Issue 2104, pp. 1-23, September, 2017
Abstract PDF Liam O'Connor, Zilin Chen, Christine Rizkallah, Sidney Amani, Japheth Lim, Toby Murray, Yutaka Nagashima, Thomas Sewell and Gerwin Klein
Refinement through restraint: Bringing down the cost of verification
International Conference on Functional Programming, Nara, Japan, September, 2016
Abstract PDF Sidney Amani
A methodology for trustworthy file systems
PhD Thesis, CSE, UNSW, Sydney, Australia, August, 2016
Abstract PDF Christine Rizkallah, Japheth Lim, Yutaka Nagashima, Thomas Sewell, Zilin Chen, Liam O'Connor, Toby Murray, Gabriele Keller and Gerwin Klein
A framework for the automatic formal verification of refinement from Cogent to C
International Conference on Interactive Theorem Proving, Nancy, France, August, 2016
Abstract PDF Sidney Amani, Alex Hixon, Zilin Chen, Christine Rizkallah, Peter Chubb, Liam O'Connor, Joel Beeren, Yutaka Nagashima, Japheth Lim, Thomas Sewell, Joseph Tuong, Gabriele Keller, Toby Murray, Gerwin Klein and Gernot Heiser
Cogent: verifying high-assurance file system implementations
International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 175–188, Atlanta, GA, USA, April, 2016
Abstract PDF Sidney Amani and Toby Murray
Specifying a realistic file system
Workshop on Models for Formal Analysis of Real Systems, pp. 1–9, Suva, Fiji, November, 2015
Abstract PDF Liam O'Connor, Gabriele Keller, Sidney Amani, Toby Murray, Gerwin Klein, Zilin Chen and Christine Rizkallah
CDSL version 1: Simplifying verification with linear types
Technical Report, NICTA, October, 2014
Abstract
Slides
PDF Gabriele Keller, Toby Murray, Sidney Amani, Liam O'Connor, Zilin Chen, Leonid Ryzhyk, Gerwin Klein and Gernot Heiser
File systems deserve verification too!
Workshop on Programming Languages and Operating Systems (PLOS), pp. 1–7, Farmington, Pennsylvania, USA, November, 2013
A revised version of this paper was published in Operating Systems Review, Volume 48, Issue 1, January 2014, pages 58-64.
Abstract
Slides
PDF Sidney Amani, Leonid Ryzhyk and Toby Murray
Towards a fully verified file system
Poster presentation at EuroSys Doctoral Workshop, Bern, Switzerland, April, 2012