Distributed Systems Safety Research

About Jepsen

Jepsen is an effort to improve the safety of distributed databases, queues, consensus systems, etc. We maintain an open source software library for systems testing, as well as blog posts and conference talks exploring particular systems' failure modes. In each analysis we explore whether the system lives up to its documentation's claims, file new bugs, and suggest recommendations for operators.

Jepsen pushes vendors to make accurate claims and test their software rigorously, helps users choose databases and queues that fit their needs, and teaches engineers how to evaluate distributed systems correctness for themselves.

In addition to public analyses, Jepsen offers technical talks, training classes, and distributed systems consulting services.

Recent Work

  • We worked with Cockroach Labs to refine the Jepsen test suite they wrote for CockroachDB, and found multiple bugs leading to serializability violations, all of which are now fixed.

  • Jepsen helped MongoDB identify design flaws in their v0 replication protocol and implementation bugs in its v1 replacement, all of which could lead to the loss of majority-acknowledged operations. We also collaborated with MongoDB to integrate Jepsen into their CI system. MongoDB added support for linearizable reads in October 2016.

  • Research for led to cases of dirty reads, replica divergence, and lost updates in Elasticsearch.

  • Jepsen found that document versions in do not uniquely identify a particular version of a document, allowing lost updates.

  • We worked with VoltDB to discover and fix stale and dirty reads in their SQL database, and, in uncommon configurations, two bugs leading to the loss of acknowledged updates.