Libra: A library for reliable distributed applications
Authors
School of Computer Science and Engineering
UNSW,
Sydney 2052, Australia
Abstract
This paper describes libra, a library to support efficient reliable distributed applications. Libra is designed to meet two objectives: to simplify the development of reliable distributed applications, and to achieve fault-tolerance at low run-time cost. The first objective is met by the provision of fault-tolerance transparency and a simple, easy to use high-level message passing interface. Fault-tolerance is provided to applications transparently by libra and is based on distributed consistent checkpointing and rollback-recovery integrated with a user-level network communication protocol. The second objective is met by the use of protocols which minimise communication overhead for taking a consistent distributed checkpoint and catching messages in transit, and impose low overhead in terms of running times. The paper presents measurements backing up these claims.
BibTeX Entry
@inproceedings{Ouyang_Heiser_96, address = {Sunnyvale, CA, USA}, author = {Jinsong Ouyang and Gernot Heiser}, booktitle = {International Conference on Parallel and Distributed Processing Techniques and Applications}, month = aug, pages = {801--810}, paperurl = {https://trustworthy.systems/publications/papers/Ouyang_Heiser_96.pdf}, title = {Libra: A Library for Reliable Distributed Applications}, year = {1996} }