JEPSEN

Distributed Systems Safety Research

About Jepsen

Jepsen aims to improve the safety of distributed databases, queues, consensus systems, etc. We maintain an open source library for safety testing, and publish free, in-depth analyses of specific systems. In each analysis we explore whether the system lives up to its documentation’s claims, file new bugs, and suggest recommendations for operators. In addition to paid analysis, Jepsen offers technical talks, training classes, and consulting services.

Jepsen pushes vendors to make accurate claims and test their software rigorously, helps users choose databases and queues that fit their needs, and teaches engineers how to evaluate distributed systems correctness for themselves.

News

Recent research, analyses, and announcements.

Jepsen tested MariaDB Galera Cluster, versions 12.1.2 through 12.2.2, and found two scenarios which led to the loss of committed transactions. First, under the recommended configuration settings it does not flush data to disk before acknowledgement; committed transactions can be lost when nodes crash in quick succession. Second, it occasionally loses committed transactions with process crashes and network partitions. Even without faults, MariaDB Galera Cluster allows P4 (Lost Update), and therefore fails to satisfy its claimed isolation level “between Serializable and Repeatable Read”. It also regularly exhibits Stale Read.

NATS 2.12.1

2025-12-07

NATS is a popular distributed streaming system. Jepsen tested NATS 2.12.1, focusing on its durable JetStream subsystem, and found that it could lose data or get stuck in persistent split-brain in response to file corruption or simulated node failures. This data loss was caused in part by a default fsync policy which flushed data to disk once every two minutes, rather than before acknowledgement. Even a single kernel crash or power failure, combined with process pauses or network partitions, could cause NATS replicas to lose acknowledged messages. NATS has documented its default lazy fsync setting, and is considering the other issues we found.

Jepsen 0.3.10

2025-12-01

A new Jepsen release, 0.3.10, is now available on GitHub and Clojars. This release is aimed at controllable entropy and support for running Jepsen inside Antithesis: a deterministic simulation testing environment. A new supporting library, jepsen.generator, provides the current generator system along with jepsen.random: a new namespace for pluggable random value generation. Jepsen uses these RNGs throughout, which makes it possible to run a test with a deterministic seed, or to source entropy from an external system, like Antithesis. The jepsen.antithesis library provides additional support for assertions, randomness, and lifecycle operations, plus wrappers for clients and checkers.

Also, this release introduces a new kind of visualization: op color plots, which show operations over time with different user-defined colors. This is particularly helpful for getting a feeling for “when did we lose data?” or “did only read-only queries succeed during a partition?”

Jepsen and Antithesis wrote A Distributed Systems Reliability Glossary: a free reference for engineers who build, test, and operate distributed systems. It covers basic concurrency theory, consistency models, various faults, approaches to testing, and offers some links to further reading.

The latest Jepsen talk, “Jepsen 18: Serializable Mom”, is now available on Youtube. This talk was presented on June 20, 2025, at Systems Distributed in Amsterdam. It covers Bufstream 0.1.0, Amazon RDS for PostgreSQL 17.4, and TigerBeetle 0.16.1.

All news from Jepsen…