Cogent
Cogent is a purely functional language aimed at reducing the cost of verified systems code.
Context
Fundamental to our approach for developing trustworthy systems is to use a minimal trusted computing base (TCB) and formally prove its correctness. We build such systems on top of the seL4 microkernel, for which we have developed proofs of implementation correctness and enforcement of isolation properties, required for security and safety.
However, a real system's TCB is bigger than the microkernel. In particular, it will contain operating system services, such as file systems, device drivers, and network stacks. In order to achieve overall trustworthiness, these services must be verified. As manual verification is costly and scales poorly, we are investigating ways to (partially) automate the verification of system services. We are using file systems as a first case study.
Approach: Co-generation of code and proof
The core idea of Cogent is to specify a code module's functionality in a high-level language, and from that generate low-level (C) code, together with a proof of the correctness of the low-level code. This means the programmer can focus on the logic of the problem, i.e. the algorithms, without having to worry about error-prone tasks such as memory management or unwinding of state in the case of unsuccessful operations.
Importantly, the high-level language the provides a more convenient and productive environment for proving the correctness of the overall operation of the service. This is similar to the (manual) approach we took with seL4, where we did a two-step refinement proof: (1) from a high-level functional specification to a low-level executable spec, and (2) from the executable spec to C code; Cogent automates the second step.
We use file system implementations to drive the design and implementation of the Cogent framework. We can make a number of observations about file systems (at least some of which also hold for other systems code, such as network stacks):
- Modern operating systems have a layered architecture, where services are stacks of abstraction levels, file systems and network stacks are examples. Such a logical structure lends itself to a modular implementation, with components interfaces matching the abstraction levels. For verification, such modularity is a big advantage, as components with well-defined interfaces can be verified individually. Given our earlier findings that verification effort scales quadratically with code size, this is key to reducing overall verification cost;
- Due to the nature of FSs, their implementations tend not to have much sharing among data structures. This feature is well captured by a linear type system, which directly enables easy memory management, and as a consequence build a straight linkage between a pure high-level semantics and a stateful low-level semantics;
- The control logic is cluttered and obscured with error-handling code; in some modules, error-handling makes up 80% of all code. Much of this is boilerplate (which is nevertheless error-prone), and can be abstracted away into a domain-specific language (DSL). This lets the implementer focus on the high-level logic;
- In-memory data structures for maintaining the file system information can largely be handled by suitable abstract data types (ADTs), which can be implemented and verified separately;
- A large subset of FS code complexity can be easily described using DSLs, namely code for serialising and de-serialising between rich, in-memory data structures; flat, on-medium representations; and code implementing the actual file-system logic.
The below figure shows the main components of our framework. Dark grey boxes are the tools that we employ in this framework; light grey boxes represent code or specifications that are automatically generated; white boxes represent semantics; block arrows represent the proofs which connect them; green arrows are the parts that we have managed to fully automate; finally, dotted boxes indicate components which the FS implementer must provide. While the correctness specification, proofs, and implementation (dotted boxes and arrows in the figure) are specific to the FS being built, the ADT library is reusable across many different FSs.
As indicated in the diagram, the file system implementer must provide four distinct components:
- A high-level specification of FS functionality (in Isabelle/HOL);
- The proof that the Cogent generated high-level specification corresponds to the functional correctness specification of the FS;
- The implementation of the FS in Cogent;
- The implementation of ADTs and interfaces to OS in C, together with proofs that the implementation is correct with respect to the functional correctness specification (partially provided by us or other ADT authors).
The Cogent compiler takes Cogent source code as input and produces, from the bottom of the figure above: the C code, the semantics of the C code expressed in Isabelle/Simpl, the same expressed as a monadic functional program, a monomorphic deep embedding of the Cogent program in A normal form, a polymorphic A-normal deep embedding of the same, an A-normal shallow embedding, and finally a 'neat' shallow embedding of the Cogent program that is syntactically close to the Cogent source code. Several theorems rely on the Cogent program being well-typed, which we prove automatically using type inference information from the compiler. The solid arrows on the right-hand side of the figure represent refinement proofs. The only arrow that is not formally verified is the one crossing from C code into Isabelle/HOL at the bottom (the red arrow) — this is the C parser, which is a mature verification tool used in a number of large-scale verifications (e.g. seL4).
For well-typed Cogent programs, we prove a series of refinement relations. The refinement proofs state that every behaviour exhibited by the C code can also be exhibited by the Cogent code and, furthermore, that the C code is always well-defined, including that e.g. the generated C code never dereferences a null pointer, and never causes signed overflow. It also implies that the generated C code is type-safe and memory-safe, meaning the code will never try to dereference an invalid pointer or try to dereference two aliasing pointers of incompatible types. From the generated proofs, We additionally get guarantees that the generated code handles all error cases, is free of memory leaks, and never double-frees a pointer.
A key component to the overall refinement proof comes directly from Cogent’s linear type system. As linear types ensure that only one reference exists at a time to a heap object, a function that destructively updates an object may be given the type of a pure function, as it is impossible to distinguish destructive update from a purely functional update without aliasing. This allows us to assign two different semantic interpretations to the same code: A purely functional value semantics, which is is suitable for high level verification using equational reasoning, and an imperative update semantics, including pointers and a mutable heap, which more closely resembles the semantics of the generated C. We prove once and for all that, for any well-typed program, the update semantics is a refinement of the value semantics.
Research contributions
The project explores a new methodology for designing and formally verifying the correctness of systems code, based on a new language Cogent. We use file systems for case studies. Specific contributions include:
- Specification of FS correctness;
- The use of linearly typed Cogent language for real world FS implementation;
- Proof of FS operations correctness and how Cogent reduces the effort;
- The design and formal development of the Cogent language and its certifying compiler;
- The fully automatic refinement tower between C code and Cogent's language specification.
Status
The Cogent project, besides its initial goal of reducing the effort of developing formally verified systems software, has become a platform for a wider range of research activities.
File systems
To date we have implemented several file systems in Cogent:
- BilbyFS, a custom flash file system designed from scratch to be highly modular, to ease verification against a high-level functional specification one component at a time. BilbyFS strikes a balance between the simplicity of JFFS2 and the performance of UBIFS, the two most popular Linux flash file systems;
- ext2fs, a standard Linux file system. Our present ext2fs implementation is an almost feature-complete implementation of revision 1: it passes the Posix File System Test Suite, except for the ACL and symlink tests (as we have not implemented those features);
- vfat, a FAT port to Linux;
- f2fs, a native flash file system in Linux.
All file system implementations can be deployed in Linux, they sit below Linux's virtual file system switch (VFS) module, using C stub code to provide the top-level entry points expected by the VFS. We are also using them as native components on seL4.
The Cogent implementations of the file systems share a common library of ADTs that includes fixed-length arrays for words and structures, simple iterators for implementing for-loops, and Cogent stubs for accessing a range of Linux APIs such as the buffer cache and its native red-black tree implementation, plus iterators for each ADT.
For BilbyFS we have connected the low-level proofs to manual
proofs of the correctness of two high-level functions:
sync()
and iget()
, to demonstrate the
ease of verifying the high-level functionality. More complete
verification will happen in the future.
Other systems components
We are currently actively looking into applying our experience acquired in developing file systems to other systems components.
Cogent language and compiler
Since our ICFP'16 paper, there have been a lot of improvements to the Cogent language design and to the compiler. Most notably:
- A new type system, introducing a better subtyping relation to records and variants;
- A constraint-based surface language typechecker, along with an Isabelle formal proof showing the correctness properties of this typechecking algorithm;
- Various other language features which aim to improve the user-friendliness of the language.
Dargent
Dargent is a data layout description language extension to Cogent. Dargent allows users to customise the memory layouts of Cogent algebraic datatypes down to the bits and bytes. Therefore, Cogent types can be laid out in a more idiomatic manner, resulting in more efficient C code and better compliance with protocols, standards and interfaces, while retaining the abstraction that algebraic datatypes have. Dargent, albeit sharing a lot in common with data description languages (DSLs), it solves a very different problem and should not be confused with them. To learn more, please read our recently published Dargent paper (POPL 2023). In this paper, we introduce the Dargent language, show how we integrate Dargent into the existing certifying compilation framework, and present several case-studies using Dargent.
Property-based testing
Formally proving functional correctness properties of systems code is expensive, and is vulnerable to changes in the source programs. In contrast, property-based testing is lightweight and can be used as a temporary substitute for formal proofs, especially in the development process. The properties used in testing should match closely with the lemmas to be proved, thus the effort spent in testing will not be wasted when coming to formal verification.
Dependently typed Cogent
Dependent type systems are attractive for their expressiveness and proof-carrying nature. We are exploring adding a restricted form of dependent types to Cogent. The research challenge lies in the integration between dependent types and uniqueness types.
Cogent standard library
We are redesigning the Cogent standard library for accessing the Linux kernel. The new design has a higher-level abstraction over the Linux kernel interfaces, easing the specification and proof of systems code in Cogent.
Talks
An ICFP'16 talk given by Liam O'Connor can be viewed here. The talk gives a high-level overview of the project and our work. If you are new to Cogent (or not, even), you are highly recommended to start your journey here.
Code
Code and documentation for this project can be found on Github.
Contributing
The Contributor License Agreement can be downloaded here. If you have larger changes or additions, it might be a good idea to get in contact with us, so we can help you get started.
People
Past
|
Publications
Zilin Chen Towards a practical high-assurance systems programming language PhD Thesis, UNSW, Sydney, Australia, March, 2023 | ||
Zilin Chen, Ambroise Lafont, Liam O'Connor, Gabriele Keller, Craig McLaughlin, Vincent Jackson and Christine Rizkallah Dargent: A silver bullet for verified data layout refinement Proc. ACM Program. Lang., Volume 7, Number POPL, January, 2023 | ||
Zilin Chen, Christine Rizkallah, Liam O'Connor, Partha Susarla, Gerwin Klein, Gernot Heiser and Gabriele Keller Property-based testing: Climbing the stairway to verification ACM SIGPLAN International Conference on Software Language Engineering, Auckland, New Zealand, December, 2022 Distinguished Artifact Award | ||
Louis Cheung, Liam O'Connor and Christine Rizkallah Overcoming restraint: Composing verification of foreign functions with Cogent Proceedings of the 11th ACM SIGPLAN International Conference on Certified Programs and Proofs, pp. 13–26, New York, NY, USA, 2022 | ||
Liam O'Connor, Zilin Chen, Christine Rizkallah, Vincent Jackson, Sidney Amani, Gerwin Klein, Toby Murray, Thomas Sewell and Gabriele Keller Cogent: Uniqueness types and certifying compilation Journal of Functional Programming, Volume 31, 2021 | ||
Gernot Heiser, Gerwin Klein and June Andronick seL4 in Australia: From research to real-world trustworthy systems Communications of the ACM, Volume 63, Issue 4, pp. 72-75, April, 2020 | ||
Liam O'Connor Type systems for systems types PhD Thesis, UNSW, Sydney, Australia, August, 2019 | ||
Zilin Chen, Matt Di Meglio, Liam O'Connor, Partha Susarla Ajay, Christine Rizkallah and Gabriele Keller A data layout description language for Cogent at PriSC, Lisbon, Portugal, January, 2019 | ||
Liam O'Connor, Zilin Chen, Partha Susarla Ajay, Christine Rizkallah, Gerwin Klein and Gabriele Keller Bringing effortless refinement of data layouts to Cogent International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, pp. 134-149, Limassol, Cyprus, November, 2018 | ||
Zilin Chen, Liam O'Connor, Gabriele Keller, Gerwin Klein and Gernot Heiser The Cogent case for property-based testing Workshop on Programming Languages and Operating Systems (PLOS), pp. 1-7, Shanghai, China, October, 2017 | ||
Gerwin Klein, June Andronick, Gabriele Keller, Daniel Matichuk, Toby Murray and Liam O'Connor Provably trustworthy systems Philosophical Transactions of the Royal Society A, Volume 375, Issue 2104, pp. 1-23, September, 2017 | ||
Liam O'Connor, Zilin Chen, Christine Rizkallah, Sidney Amani, Japheth Lim, Toby Murray, Yutaka Nagashima, Thomas Sewell and Gerwin Klein Refinement through restraint: Bringing down the cost of verification International Conference on Functional Programming, Nara, Japan, September, 2016 | ||
Sidney Amani A methodology for trustworthy file systems PhD Thesis, CSE, UNSW, Sydney, Australia, August, 2016 | ||
Christine Rizkallah, Japheth Lim, Yutaka Nagashima, Thomas Sewell, Zilin Chen, Liam O'Connor, Toby Murray, Gabriele Keller and Gerwin Klein A framework for the automatic formal verification of refinement from Cogent to C International Conference on Interactive Theorem Proving, Nancy, France, August, 2016 | ||
Sidney Amani, Alex Hixon, Zilin Chen, Christine Rizkallah, Peter Chubb, Liam O'Connor, Joel Beeren, Yutaka Nagashima, Japheth Lim, Thomas Sewell, Joseph Tuong, Gabriele Keller, Toby Murray, Gerwin Klein and Gernot Heiser Cogent: verifying high-assurance file system implementations International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 175–188, Atlanta, GA, USA, April, 2016 | ||
Sidney Amani and Toby Murray Specifying a realistic file system Workshop on Models for Formal Analysis of Real Systems, pp. 1–9, Suva, Fiji, November, 2015 | ||
Liam O'Connor, Gabriele Keller, Sidney Amani, Toby Murray, Gerwin Klein, Zilin Chen and Christine Rizkallah CDSL version 1: Simplifying verification with linear types Technical Report, NICTA, October, 2014 | ||
|
Gabriele Keller, Toby Murray, Sidney Amani, Liam O'Connor, Zilin Chen, Leonid Ryzhyk, Gerwin Klein and Gernot Heiser File systems deserve verification too! Workshop on Programming Languages and Operating Systems (PLOS), pp. 1–7, Farmington, Pennsylvania, USA, November, 2013 A revised version of this paper was published in Operating Systems Review, Volume 48, Issue 1, January 2014, pages 58-64. | |
|
Sidney Amani, Leonid Ryzhyk and Toby Murray Towards a fully verified file system Poster presentation at EuroSys Doctoral Workshop, Bern, Switzerland, April, 2012 |