JEPSEN

Distributed Systems Safety Research

About Jepsen

Jepsen is an effort to improve the safety of distributed databases, queues, consensus systems, etc. We maintain an open source software library for systems testing, as well as blog posts and conference talks exploring particular systems’ failure modes. In each analysis we explore whether the system lives up to its documentation’s claims, file new bugs, and suggest recommendations for operators.

Jepsen pushes vendors to make accurate claims and test their software rigorously, helps users choose databases and queues that fit their needs, and teaches engineers how to evaluate distributed systems correctness for themselves.

In addition to public analyses, Jepsen offers technical talks, training classes, and a variety of consulting services.

Other Resources

Recent Work

In a brief survey of RavenDB 6.0.2, we found “ACID” transactions allowed both lost updates and fractured read, even in healthy single-node clusters. Depending on how you interpret RavenDB’s documentation and response to this work, RavenDB may not have interactive transactions at all.
We revisited Kleppmann’s work on MySQL isolation levels and found surprising behavior in 8.0.34. MySQL’s REPEATABLE READ not only exhibits G2-item, G-single, and lost update, but also violates internal consistency and Monotonic Atomic View. It satisfies neither Adya’s Repeatable Read nor the ambiguous ANSI SQL definition. We also discovered AWS RDS MySQL clusters routinely violate Serializability at the SERIALIZABLE isolation level.
Jepsen and Redpanda worked together to analyze the Redpanda distributed queue, versions 21.10.1 through 21.11.2. We found three liveness and seven safety issues, including crashes, aborted reads, inconsistent offsets, circular information flow, and lost or stale messages. We also discuss surprising aspects of the Kafka/Redpanda transaction model. Redpanda has resolved seven of the issues we found, although lost/stale messages and one crash remain under investigation. A few other issues require only documentation to clarify intended behavior.
We analyzed the Radix DLT distributed ledger system at version 1.0-beta.35.1, 1.0.0, 1.0.1, and 1.0.2. We found 11 safety errors including stale reads, aborted and intermediate reads, and the partial or total loss of committed transactions from transaction log. We also identified issues with transactions which hung indefinitely, and degraded performance under single-node faults. Following a system redesign RDX Works reports all safety issues, as well as the indefinite transaction issue, were fixed in version 1.1.0.
Together with the ScyllaDB team, we found seven problems in Scylla, including lightweight transaction (LWT) split-brain in healthy clusters due to a.) incomplete row hashcodes and b.) multiple problems with membership changes. We also identified incomplete or inaccurate documentation, including claims that non-LWT operations were isolated and atomic, and undocumented rules about what kinds of membership operations were legal. Scylla has corrected almost all of these errors via patches and documentation; remaining cases of split-brain appear limited to concurrent membership changes.
We helped Redis Labs verify early development builds of Redis-Raft, and found twenty one issues, including crashes, split-brain, infinite loops, aborted reads, data corruption, and total data loss on any failover. Redis Labs has addressed all but one of these issues in recent versions, and continues to work towards general availability in 2021.
Jepsen identified a serializability violation in PostgreSQL 12.3, where concurrent inserts and updates could result in transactions which fail to observe each other’s effects. This bug appears to have been present since the implementation of serializable snapshot isolation in version 9.1. A patch should be available in the next minor release, currently scheduled for August 13th.
A brief investigation into MongoDB 4.2.6’s transaction system found violations of snapshot isolation, rather than claimed “full ACID” guarantees. Weak defaults allowed transactions to lose writes and allow stale reads unless carefully controlled.
Dgraph Labs, makers of the Dgraph graph database, worked with Jepsen to follow up on our 2018 analysis of Dgraph 1.0.2. We are pleased to report that Dgraph Labs resolved all issues from this previous work, and together, we identified five new safety issues in version 1.1.1, all involving tablet migration. Three of these issues appear addressed, and the remainder are under investigation.
Jepsen and the etcd team worked to verify etcd’s key-value operations, locks, and watches. We found no problems with key-value operations or watches: kv operations, including multi-key transactions, appeared to be strict serializable, and watches delivered every change to keys in order, with the exception of an undocumented edge case around revision zero. Locks, however, were fundamentally unsafe: they did not, and can not, guarantee mutual exclusion in asynchronous networks. This theoretical risk was exacerbated by an implementation bug—now fixed in master. The etcd team is preparing new documentation around safety guarantees.
We collaborated with YugaByte to validate YugaByte DB at version 1.3.1’s beta support for serializable SQL transactions. We found two safety issues: a race condition allowing DEFAULT columns to be initialized to NULL, and a serializability violation (G2-item) associated with tablet servers losing their connection to master nodes. We also found several problems with DDL statements like table creation, and some availability issues, such as slow recovery during network partitions, and process leaks. YugaByte has addressed the G2 issue as well as some of the other bugs, and is working on remaining issues.
Together with PingCAP, we tested TiDB 2.1.7 through 3.0.0-rc.2, and found that by default, TiDB violated snapshot isolation, allowing read skew, lost updates, and incompatible orders, thanks to two separate auto-retry mechanisms which blindly re-applied conflicting writes. 3.0.0-rc.2 changed the default settings to provide snapshot isolation by default, as well as fixing a race condition in table creation, and some crashes on startup.
In cooperation with YugaByte, we identified three safety issues in YugaByte DB, including read skew and data corruption under normal operating conditions, read skew and data corruption during clock skew, and, under rare circumstances involving multiple network partitions, the loss of acknowledged inserts. YugaByte DB addressed all three of these problems, as well as several other issues, in 1.2.0.
Fauna and Jepsen worked together to identify 19 issues in FaunaDB, a distributed transactional document store based on the Calvin protocol. In addition to minor problems with schema changes and availability, we found that indices could fail to return negative integer values, skip records at the end of pages, and exhibit read skew. Temporal queries could observe older states than the timestamp they requested, and exhibited occasional read skew. We also found nonmonotonic reads, long fork, and read skew in development releases. We found that basic key-value operations in FaunaDB 2.5.4 appeared to provide snapshot isolation up to strict serializability, depending on workload. FaunaDB has addressed every serious issue we identified in version 2.6.0, and fixes for minor issues, like concurrent schema modification, are slated for upcoming releases.
We worked with MongoDB to explore sharded single-document consistency and MongoDB’s new support for causally consistent sessions. We found that sharded clusters appeared to preserve single-document linearizability and did not lose inserted documents during shard rebalancing and network partitions. However, causal sessions required majority reads and writes; at the default consistency levels, causal sessions did not preserve claimed ordering invariants. MongoDB has updated their documentation to reflect this.