Trustworthy Systems

Libra: A library for reliable distributed applications


Jinsong Ouyang and Gernot Heiser

    School of Computer Science and Engineering
    Sydney 2052, Australia


This paper describes libra, a library to support efficient reliable distributed applications. Libra is designed to meet two objectives: to simplify the development of reliable distributed applications, and to achieve fault-tolerance at low run-time cost. The first objective is met by the provision of fault-tolerance transparency and a simple, easy to use high-level message passing interface. Fault-tolerance is provided to applications transparently by libra and is based on distributed consistent checkpointing and rollback-recovery integrated with a user-level network communication protocol. The second objective is met by the use of protocols which minimise communication overhead for taking a consistent distributed checkpoint and catching messages in transit, and impose low overhead in terms of running times. The paper presents measurements backing up these claims.

BibTeX Entry

    address          = {Sunnyvale, CA, USA},
    author           = {Jinsong Ouyang and Gernot Heiser},
    booktitle        = {International Conference on Parallel and Distributed Processing Techniques and Applications},
    month            = aug,
    pages            = {801--810},
    paperurl         = {},
    title            = {Libra: A Library for Reliable Distributed Applications},
    year             = {1996}