Cogent is a purely functional language aimed at reducing the cost of verified systems code.
Fundamental to our approach for developing trustworthy systems is to use a minimal trusted computing base (TCB) and formally prove its correctness. We build such systems on top of the seL4 microkernel, for which we have developed proofs of implementation correctness and enforcement of isolation properties, required for security and safety.
However, a real system's TCB is bigger than the microkernel. In particular, it will contain operating system services, such as file systems, device drivers, and network stacks. In order to achieve overall trustworthiness, these services must be verified. As manual verification is costly and scales poorly, we are investigating ways to (partially) automate the verification of system services. We are using file systems as a first case study.
The core idea of Cogent is to specify a code module's functionality in a high-level language, and from that generate low-level (C) code, together with a proof of the correctness of the low-level code. This means the programmer can focus on the logic of the problem, i.e. the algorithms, without having to worry about error-prone tasks such as memory management or unwinding of state in the case of unsuccessful operations.
Importantly, the high-level language the provides a more convenient and productive environment for proving the correctness of the overall operation of the service. This is similar to the (manual) approach we took with seL4, where we did a two-step refinement proof: (1) from a high-level functional specification to a low-level executable spec, and (2) from the executable spec to C code; Cogent automates the second step.
We use file system implementations to drive the design and implementation of the Cogent framework. We can make a number of observations about file systems (at least some of which also hold for other systems code, such as network stacks):
The below figure shows the main components of our framework. Dark grey boxes are the tools that we employ in this framework; light grey boxes represent code or specifications that are automatically generated; white boxes represent semantics; block arrows represent the proofs which connect them; green arrows are the parts that we have managed to fully automate; finally, dotted boxes indicate components which the FS implementer must provide. While the correctness specification, proofs, and implementation (dotted boxes and arrows in the figure) are specific to the FS being built, the ADT library is reusable across many different FSs.
As indicated in the diagram, the file system implementer must provide four distinct components:
The Cogent compiler takes Cogent source code as input and produces, from the bottom of the figure above: the C code, the semantics of the C code expressed in Isabelle/Simpl, the same expressed as a monadic functional program, a monomorphic deep embedding of the Cogent program in A normal form, a polymorphic A-normal deep embedding of the same, an A-normal shallow embedding, and finally a 'neat' shallow embedding of the Cogent program that is syntactically close to the Cogent source code. Several theorems rely on the Cogent program being well-typed, which we prove automatically using type inference information from the compiler. The solid arrows on the right-hand side of the figure represent refinement proofs. The only arrow that is not formally verified is the one crossing from C code into Isabelle/HOL at the bottom (the red arrow) — this is the C parser, which is a mature verification tool used in a number of large-scale verifications (e.g. seL4).
For well-typed Cogent programs, we prove a series of refinement relations. The refinement proofs state that every behaviour exhibited by the C code can also be exhibited by the Cogent code and, furthermore, that the C code is always well-defined, including that e.g. the generated C code never dereferences a null pointer, and never causes signed overflow. It also implies that the generated C code is type-safe and memory-safe, meaning the code will never try to dereference an invalid pointer or try to dereference two aliasing pointers of incompatible types. From the generated proofs, We additionally get guarantees that the generated code handles all error cases, is free of memory leaks, and never double-frees a pointer.
A key component to the overall refinement proof comes directly from Cogent’s linear type system. As linear types ensure that only one reference exists at a time to a heap object, a function that destructively updates an object may be given the type of a pure function, as it is impossible to distinguish destructive update from a purely functional update without aliasing. This allows us to assign two different semantic interpretations to the same code: A purely functional value semantics, which is is suitable for high level verification using equational reasoning, and an imperative update semantics, including pointers and a mutable heap, which more closely resembles the semantics of the generated C. We prove once and for all that, for any well-typed program, the update semantics is a refinement of the value semantics.
The project explores a new methodology for designing and formally verifying the correctness of systems code, based on a new language Cogent. We use file systems for case studies. Specific contributions include:
The Cogent project, besides its initial goal of reducing the effort of developing formally verified systems software, has become a platform for a wider range of research activities.
To date we have implemented several file systems in Cogent:
All file system implementations can be deployed in Linux, they sit below Linux's virtual file system switch (VFS) module, using C stub code to provide the top-level entry points expected by the VFS. We are also using them as native components on seL4.
The Cogent implementations of the file systems share a common library of ADTs that includes fixed-length arrays for words and structures, simple iterators for implementing for-loops, and Cogent stubs for accessing a range of Linux APIs such as the buffer cache and its native red-black tree implementation, plus iterators for each ADT.
For BilbyFS we have connected the low-level proofs to manual
proofs of the correctness of two high-level functions:
iget(), to demonstrate the
ease of verifying the high-level functionality. More complete
verification will happen in the future.
We are currently actively looking into applying our experience acquired in developing file systems to other systems components.
Since our ICFP'16 paper, there have been a lot of improvements to the Cogent language design and to the compiler. Most notably:
Dependent type systems are attractive for their expressiveness and proof-carrying nature. We are exploring adding a restricted form of dependent types to Cogent. The research challenge lies in the integration between dependent types and uniqueness types.
Dargent is a data layout description language atop Cogent. Dargent allows users to customise the Cogent compilation of algebraic datatypes down to low-level C code. This will make compiled C code more efficient and the integration with other systems components written in C seamless. Dargent can also be used for data marshalling and unmarshalling in systems implementations.
Formally proving functional correctness properties of systems code is expensive, and is vulnerable to changes in the source programs. In contrast, property-based testing is lightweight and can be used as a temporary substitute for formal proofs, especially in the development process. The properties used in testing should match closely with the lemmas to be proven, thus the effort spent in testing will not be wasted when coming to formal verification.
We are redesigning the Cogent standard library for accessing the Linux kernel. The new design has a higher-level abstraction over the Linux kernel interfaces, easing the specification and proof of systems code in Cogent.
An ICFP'16 talk given by Liam O'Connor can be viewed here. The talk gives a high-level overview of the project and our work. If you are new to Cogent (or not, even), you are highly recommended to start your journey here.
Code and documentation for this project can be found on Github.
The Contributor License Agreement can be downloaded here. If you have larger changes or additions, it might be a good idea to get in contact with us, so we can help you get started.
We also have a list of potential topics for students of varies stages in their careers. Please contact us for more information.
|Louis Cheung, Liam O'Connor and Christine Rizkallah|
Overcoming restraint: Composing verification of foreign functions with Cogent
Proceedings of the 11th ACM SIGPLAN International Conference on Certified Programs and Proofs, pp. 13–26, New York, NY, USA, 2022
|Liam O’Connor, Zilin Chen, Christine Rizkallah, Vincent Jackson, Sidney Amani, Gerwin Klein, Toby Murray, Thomas Sewell and Gabriele Keller|
Cogent: Uniqueness types and certifying compilation
Journal of Functional Programming, Volume 31, 2021
|Gernot Heiser, Gerwin Klein and June Andronick|
seL4 in Australia: From research to real-world trustworthy systems
Communications of the ACM, Volume 63, Issue 4, pp. 72-75, April, 2020
Type systems for systems types
PhD Thesis, UNSW, Sydney, Australia, August, 2019
|Zilin Chen, Matt Di Meglio, Liam O'Connor, Partha Susarla Ajay, Christine Rizkallah and Gabriele Keller|
A data layout description language for Cogent
at PriSC, Lisbon, Portugal, January, 2019
|Liam O'Connor, Zilin Chen, Partha Susarla Ajay, Christine Rizkallah, Gerwin Klein and Gabriele Keller|
Bringing effortless refinement of data layouts to Cogent
International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, pp. 134-149, Limassol, Cyprus, November, 2018
|Zilin Chen, Liam O'Connor, Gabriele Keller, Gerwin Klein and Gernot Heiser|
The Cogent case for property-based testing
Workshop on Programming Languages and Operating Systems (PLOS), pp. 1-7, Shanghai, China, October, 2017
|Gerwin Klein, June Andronick, Gabriele Keller, Daniel Matichuk, Toby Murray and Liam O'Connor|
Provably trustworthy systems
Philosophical Transactions of the Royal Society A, Volume 375, Issue 2104, pp. 1-23, September, 2017
|Liam O'Connor, Zilin Chen, Christine Rizkallah, Sidney Amani, Japheth Lim, Toby Murray, Yutaka Nagashima, Thomas Sewell and Gerwin Klein|
Refinement through restraint: Bringing down the cost of verification
International Conference on Functional Programming, Nara, Japan, September, 2016
A methodology for trustworthy file systems
PhD Thesis, CSE, UNSW, Sydney, Australia, August, 2016
|Christine Rizkallah, Japheth Lim, Yutaka Nagashima, Thomas Sewell, Zilin Chen, Liam O'Connor, Toby Murray, Gabriele Keller and Gerwin Klein|
A framework for the automatic formal verification of refinement from Cogent to C
International Conference on Interactive Theorem Proving, Nancy, France, August, 2016
|Sidney Amani, Alex Hixon, Zilin Chen, Christine Rizkallah, Peter Chubb, Liam O'Connor, Joel Beeren, Yutaka Nagashima, Japheth Lim, Thomas Sewell, Joseph Tuong, Gabriele Keller, Toby Murray, Gerwin Klein and Gernot Heiser|
Cogent: verifying high-assurance file system implementations
International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 175–188, Atlanta, GA, USA, April, 2016
|Sidney Amani and Toby Murray|
Specifying a realistic file system
Workshop on Models for Formal Analysis of Real Systems, pp. 1–9, Suva, Fiji, November, 2015
|Liam O'Connor, Gabriele Keller, Sidney Amani, Toby Murray, Gerwin Klein, Zilin Chen and Christine Rizkallah|
CDSL version 1: Simplifying verification with linear types
Technical Report, NICTA, October, 2014
||Gabriele Keller, Toby Murray, Sidney Amani, Liam O'Connor, Zilin Chen, Leonid Ryzhyk, Gerwin Klein and Gernot Heiser|
File systems deserve verification too!
Workshop on Programming Languages and Operating Systems (PLOS), pp. 1–7, Farmington, Pennsylvania, USA, November, 2013
A revised version of this paper was published in Operating Systems Review, Volume 48, Issue 1, January 2014, pages 58-64.