Trustworthy Systems

Libra: A library for reliable distributed applications

Authors

Jinsong Ouyang and Gernot Heiser

    School of Computer Science and Engineering
    UNSW,
    Sydney 2052, Australia

Abstract

This paper describes libra, a library to support efficient reliable distributed applications. Libra is designed to meet two objectives: to simplify the development of reliable distributed applications, and to achieve fault-tolerance at low run-time cost. The first objective is met by the provision of fault-tolerance transparency and a simple, easy to use high-level message passing interface. Fault-tolerance is provided to applications transparently by libra and is based on distributed consistent checkpointing and rollback-recovery integrated with a user-level network communication protocol. The second objective is met by the use of protocols which minimise communication overhead for taking a consistent distributed checkpoint and catching messages in transit, and impose low overhead in terms of running times. The paper presents measurements backing up these claims.

BibTeX Entry

  @inproceedings{Ouyang_Heiser_96,
    address          = {Sunnyvale, CA, USA},
    author           = {Jinsong Ouyang and Gernot Heiser},
    booktitle        = {International Conference on Parallel and Distributed Processing Techniques and Applications},
    month            = aug,
    pages            = {801--810},
    paperurl         = {https://trustworthy.systems/publications/papers/Ouyang_Heiser_96.pdf},
    title            = {Libra: A Library for Reliable Distributed Applications},
    year             = {1996}
  }

Download