JEPSEN

RSS feedAtom feed

All Posts

Bufstream 0.1.0

2024-10-24

Jepsen worked with Buf to analyze the safety of Bufstream, a Kafka-compatible streaming system. We found three safety and two liveness issues in Bufstream 0.1.0, including the loss of acknowledged writes in healthy clusters. These problems were resolved by version 0.1.3. We also discovered serious issues in Kafka’s transaction protocol, including write loss, aborted read, and torn transactions. These problems affect Kafka, Bufstream, and (presumably) other Kafka-compatible systems; they remain outstanding.

In late 2023 we reported that MySQL and MariaDB’s REPEATABLE READ did not, in fact, provide repeatable reads. The MariaDB team has been hard at work this past year. They’ve added a new flag, --innodb-snapshot-isolation=true, which causes REPEATABLE READ to prevent Lost Update, Non-repeatable Read, and violations of Monotonic Atomic View.

Jepsen has not yet tested this, but it looks like MariaDB might, with the new flag enabled, offer Snapshot Isolation at REPEATABLE READ.

Jepsen’s distributed systems training introduces engineers and operators to the fundamentals of nodes and networks, consistency and availability, techniques for replicating state, a slew of design patterns, and production concerns. By popular request, we’re offering a special session of this class that anyone can register for. Join us on Zoom, December 16th through 19th, 2024. Tickets are on sale now.

Jepsen 0.3.6

2024-10-17

Jepsen 0.3.6 is now available on GitHub and Clojars. This is a sizeable release. It includes a significant correctness bugfix for a rare bug that could make operations in the history print with the wrong data. It also adds a new namespace for composing databases, nemeses, and generators when working with systems where each node has a different role. Kafka-style tests gain new powers and are significantly faster. And we have the usual slew of small bugfixes, dependency bumps, and quality-of-life improvements. Happy testing!

The full release notes are available on GitHub.

Kyle Kingsbury spoke at Systems Distributed 2024 on RavenDB, MariaDB/MySQL, and Datomic.

jetcd 0.8.2

2024-08-07

Jepsen traced lost update, circular information flow, and aborted reads in etcd tests to an improper retry mechanism in jetcd 0.8.2, which allowed transactions to be submitted multiple times, and for committed transactions to appear as if they had actually failed.

Kyle Kingsbury will speak on performance techniques in Jepsen at GOTO Chicago, October 21 & 22, 2024. The talk will touch on a mix of high-level and low-level performance optimizations to make checking large histories tractable, including parallelism, pure functions, immutable data structures, and deforestation; bitsets, avoiding sharing between threads, packing structures into mutable arrays, dynamic compilation of primitive boxes, and macro iteration magic.

Early bird tickets are on sale now.

We’ve made some small changes to the Jepsen ethics policy.

The policy used to promise that Jepsen could veto publication if Jepsen and a client could not agree on the content of an analysis. However, this veto has never been used. In fact, Jepsen’s contracts have given Jepsen final approval over the content of analyses since 2016. We replace the promise of a veto with a stronger promise of editorial control.

In light of Jepsen’s multiple authors, we also shift to an organizational third person voice. Finally, we’ve streamlined some language.

In collaboration with Nubank, we analyzed Datomic Pro 1.0.7075 and found that its inter-transaction safety properties appeared stronger than claimed. Datomic Pro appeared to offer Strong Session Serializable isolation, and Strong Serializable for histories restricted to update transactions. However, Datomic defines unusual intra-transaction semantics in which operations are applied logically concurrent with one another, rather than sequentially. While consistent with Datomic’s documentation, this could cause invariants preserved by individual transaction functions to be broken when those same functions are applied within a single transaction.

RavenDB 6.0.2

2024-01-30

In a brief survey of RavenDB 6.0.2, we found “ACID” transactions allowed both lost updates and fractured read, even in healthy single-node clusters. Depending on how you interpret RavenDB’s documentation and response to this work, RavenDB may not have interactive transactions at all.

MySQL 8.0.34

2023-12-18

We revisited Kleppmann’s work on MySQL isolation levels and found surprising behavior in 8.0.34. MySQL’s REPEATABLE READ not only exhibits G2-item, G-single, and lost update, but also violates internal consistency and Monotonic Atomic View. It satisfies neither Adya’s Repeatable Read nor the ambiguous ANSI SQL definition. We also discovered AWS RDS MySQL clusters routinely violate Serializability at the SERIALIZABLE isolation level.

Jepsen and Redpanda worked together to analyze the Redpanda distributed queue, versions 21.10.1 through 21.11.2. We found three liveness and seven safety issues, including crashes, aborted reads, inconsistent offsets, circular information flow, and lost or stale messages. We also discuss surprising aspects of the Kafka/Redpanda transaction model. Redpanda has resolved seven of the issues we found, although lost/stale messages and one crash remain under investigation. A few other issues require only documentation to clarify intended behavior.

We analyzed the Radix DLT distributed ledger system at version 1.0-beta.35.1, 1.0.0, 1.0.1, and 1.0.2. We found 11 safety errors including stale reads, aborted and intermediate reads, and the partial or total loss of committed transactions from transaction log. We also identified issues with transactions which hung indefinitely, and degraded performance under single-node faults. Following a system redesign RDX Works reports all safety issues, as well as the indefinite transaction issue, were fixed in version 1.1.0.

Scylla 4.2-rc3

2020-12-22

Together with the ScyllaDB team, we found seven problems in Scylla, including lightweight transaction (LWT) split-brain in healthy clusters due to a.) incomplete row hashcodes and b.) multiple problems with membership changes. We also identified incomplete or inaccurate documentation, including claims that non-LWT operations were isolated and atomic, and undocumented rules about what kinds of membership operations were legal. Scylla has corrected almost all of these errors via patches and documentation; remaining cases of split-brain appear limited to concurrent membership changes

We helped Redis Labs verify early development builds of Redis-Raft, and found twenty one issues, including crashes, split-brain, infinite loops, aborted reads, data corruption, and total data loss on any failover. Redis Labs has addressed all but one of these issues in recent versions, and continues to work towards general availability in 2021.

PostgreSQL 12.3

2020-06-11

Jepsen identified a serializability violation in PostgreSQL 12.3, where concurrent inserts and updates could result in transactions which fail to observe each other’s effects. This bug appears to have been present since the implementation of serializable snapshot isolation in version 9.1. A patch should be available in the next minor release, currently scheduled for August 13th.

MongoDB 4.2.6

2020-05-14

A brief investigation into MongoDB 4.2.6’s transaction system found violations of snapshot isolation, rather than claimed “full ACID” guarantees. Weak defaults allowed transactions to lose writes and allow stale reads unless carefully controlled.

Dgraph 1.1.1

2020-04-29

Dgraph Labs, makers of the Dgraph graph database, worked with Jepsen to follow up on our 2018 analysis of Dgraph 1.0.2. We are pleased to report that Dgraph Labs resolved all issues from this previous work, and together, we identified five new safety issues in version 1.1.1, all involving tablet migration. Three of these issues appear addressed, and the remainder are under investigation.

etcd 3.4.3

2020-01-29

Jepsen and the etcd team worked to verify etcd’s key-value operations, locks, and watches. We found no problems with key-value operations or watches: kv operations, including multi-key transactions, appeared to be strict serializable, and watches delivered every change to keys in order, with the exception of an undocumented edge case around revision zero. Locks, however, were fundamentally unsafe: they did not, and can not, guarantee mutual exclusion in asynchronous networks. This theoretical risk was exacerbated by an implementation bug—now fixed in master. The etcd team is preparing new documentation around safety guarantees.

We collaborated with YugaByte to validate YugaByte DB at version 1.3.1’s beta support for serializable SQL transactions. We found two safety issues: a race condition allowing DEFAULT columns to be initialized to NULL, and a serializability violation (G2-item) associated with tablet servers losing their connection to master nodes. We also found several problems with DDL statements like table creation, and some availability issues, such as slow recovery during network partitions, and process leaks. YugaByte has addressed the G2 issue as well as some of the other bugs, and is working on remaining issues.

TiDB 2.1.7

2019-06-11

Together with PingCAP, we tested TiDB 2.1.7 through 3.0.0-rc.2, and found that by default, TiDB violated snapshot isolation, allowing read skew, lost updates, and incompatible orders, thanks to two separate auto-retry mechanisms which blindly re-applied conflicting writes. 3.0.0-rc.2 changed the default settings to provide snapshot isolation by default, as well as fixing a race condition in table creation, and some crashes on startup.

In cooperation with YugaByte, we identified three safety issues in YugaByte DB, including read skew and data corruption under normal operating conditions, read skew and data corruption during clock skew, and, under rare circumstances involving multiple network partitions, the loss of acknowledged inserts. YugaByte DB addressed all three of these problems, as well as several other issues, in 1.2.0.

FaunaDB 2.5.4

2019-03-04

Fauna and Jepsen worked together to identify 19 issues in FaunaDB, a distributed transactional document store based on the Calvin protocol. In addition to minor problems with schema changes and availability, we found that indices could fail to return negative integer values, skip records at the end of pages, and exhibit read skew. Temporal queries could observe older states than the timestamp they requested, and exhibited occasional read skew. We also found nonmonotonic reads, long fork, and read skew in development releases. We found that basic key-value operations in FaunaDB 2.5.4 appeared to provide snapshot isolation up to strict serializability, depending on workload. FaunaDB has addressed every serious issue we identified in version 2.6.0, and fixes for minor issues, like concurrent schema modification, are slated for upcoming releases.

MongoDB 3.6.4

2018-10-22

We worked with MongoDB to explore sharded single-document consistency and MongoDB’s new support for causally consistent sessions. We found that sharded clusters appeared to preserve single-document linearizability and did not lose inserted documents during shard rebalancing and network partitions. However, causal sessions required majority reads and writes; at the default consistency levels, causal sessions did not preserve claimed ordering invariants. MongoDB has updated their documentation to reflect this.

Dgraph 1.0.2

2018-08-22

We analyzed Dgraph, a distributed graph database, and identified numerous deadlocks, crashes, and consistency violations, including the loss and corruption of records, even in healthy clusters. Dgraph addressed many of these issues during our collaboration, and continues to work on remaining problems.

Together with Aerospike, we validated their next-generation consensus system, confirming two known data-loss scenarios due to process pauses and crashes, and discovering a previously unknown bug in their internal RPC proxy mechanism which allowed clients to see successfully applied updates as definite failures. Aerospike fixed this bug, added an option to require nodes write to disk before acknowledging operations to clients, and plans to extend the maximum clock skew their consensus system can tolerate.

Hazelcast 3.8.3

2017-10-05

Jepsen demonstrated numerous problems with data loss in Hazelcast, an in-memory data grid: map updates could be lost, atomic references were not atomic, ID generators generated duplicate IDs, locks were not exclusive, and queues could lose acknowledged messages.

Jepsen worked with Tendermint to evaluate their distributed, linearizable, byzantine-fault-tolerant blockchain system. We were unable to find issues with their replication algorithm, but did discover single-node crashes and issues with crash recovery that could lead to unavailability or data loss.

We worked with Cockroach Labs to refine the Jepsen test suite they wrote for CockroachDB, and found multiple bugs leading to serializability violations, all of which are now fixed.

Jepsen helped MongoDB identify design flaws in their v0 replication protocol and implementation bugs in its v1 replacement, all of which could lead to the loss of majority-acknowledged operations. We also collaborated with MongoDB to integrate Jepsen into their CI system. MongoDB added support for linearizable reads in October 2016.

Research for Crate.io led to cases of dirty reads, replica divergence, and lost updates in Elasticsearch.

VoltDB 6.3

2016-07-11

We worked with VoltDB to discover and fix stale and dirty reads in their SQL database, and, in uncommon configurations, two bugs leading to the loss of acknowledged updates.