JEPSEN

Dependencies

Many consistency models and their corresponding phenomena are defined in terms of dependencies between operations. This page offers an informal explanation of transactional dependencies, adapted from Adya’s Direct Serialization Graphs and related work.¹

This is an informal pastiche of several models of transactional isolation.² We hope to give readers an intuitive understanding of the concepts used in Jepsen reports and throughout the literature. For a more rigorous introduction to these concepts, we recommend:

Adya’s 1999 Weak Consistency (see chapter three)
Fekete, Liarokapis, O’Neil, & O’Neil’s Making Snapshot Isolation Serializable
Cerone & Gotsman’s Analysing Snapshot Isolation

Fundamental Concepts

A database is a set of objects, often called x, y, and so on. Each object has many versions, which we write x_i, x_j, etc. Processes perform operations, also called transactions, on objects. A transaction is a list of micro-operations: any number of writes or reads, followed by exactly one commit or abort.

There are two kinds of reads and writes: item and predicate. An item read observes a specific version of one object x_i. An item write observes some version x_i, and produces a new version of the same object by applying some function to it: x_j = f(x_i).

A predicate is a function which takes a version and returns true or false. We imagine that the database executes a predicate operation by choosing a version set: one version for every object in the database. It then filters those versions to just those where the predicate matched: i.e., returned true. Finally, it reads or writes all matching objects.³ For instance, a chef might sprinkle cheese on every omelette in the kitchen. This is a predicate write. The version set is some (presumably, the current) version of every dish in the kitchen. Only some of those dishes (the ones containing omelettes) are modified, producing new versions of those dishes with cheese on top.

We say that a transaction T_i installs version x_i when T_i commits, and x_i is T_i’s final write of object x. We model creation and deletion as writes, using special unborn and dead versions. Predicates never match unborn or dead versions.

A history is a set of transactions, a partial order over their micro-operations which describes the sequence in which the operations (apparently) executed,⁴ and a version order. The version order is total over all installed versions of any particular object. It says, for each version of x, what version of x came next. Note that the version order may be different than the real-time or per-process order of events. The database is allowed to choose any order it likes, so long as some order exists.

Given a history H, we can define a serialization graph whose nodes are transactions, and whose edges are dependencies betwen those transactions.

This graph is often called DSG(H).⁵ These dependencies capture the flow of data between transactions, and come in three main flavors:

Write-write
Write-read
Read-write

Two additional dependencies are defined based on the order in which those transactions were performed.

Process
Real-time

Together these dependencies allow us to trace the flow of data and time through a history. Many phemomena correspond to cycles in this graph.

Write-Write

Intuitively, a write-write dependency (also known as a ww or write dependency) means that one transaction overwrote another’s write. Specifically, given two transactions T_i and T_j, T_j write-write depends on T_i if:

T_i installs some version x_i, and T_j installs the next version of x in the version order. This is an item write-write dependency.
T_i performs a predicate write for which the system selected some version of an object x_i, and T_j installs some later version of x which changes whether the predicate matches. This is one type of predicate write-write dependency. Note that the predicate does not actually have to match x_i; what matters is that whether the predicate matched was affected by T_i’s write.
T_i installs some version x_i, and T_j performs a predicate write for which the system selects x_i. This is the second form of predicate write-write dependency.

For example, imagine you’re baking a batch of brownies in a transaction, and begin by pouring chocolate powder into the bowl. Your friend, in their own transaction, helpfully washes the bowl clean. Their transaction write-write depends on yours, because you both modified the bowl, and their change directly overwrote yours. This feels suspicious, but it’s not necessarily bad. We clean dirty dishes all the time, and it could be that you never planned to use that bowl again. Perhaps your transaction was over! If so, no harm has been done.⁶

Now imagine you go to the next step in the recipe, and pour flour in to the sparkling-clean bowl. Your transaction now write-write depends on your friend’s. These two edges form a write cycle: your writes interleaved with one another, and the brownies are ruined.

Write-Read

A write-read dependency (also known as a wr or read dependency) means that one transaction observed another’s write. Specifically, transaction T_j write-read depends on T_i if T_i installs some version of an object x_i, and a T_j either:

Reads x_i. This is an item write-read dependency.
Performs a predicate read for which the system selects version x_i. This is a predicate write-read dependency. Again, the predicate need not actually match x_i.

For example, imagine you’re baking a batch of brownies in a transaction. Your friend, in another transaction, pops into the kitchen and sees that you’re making brownies. Their transaction write-read depends on yours, because they observed your changes to the kitchen.

Now imagine that your friend pours two glasses of milk to go with the brownies, and you see them do it. Your transaction now write-read depends on theirs. Because you have each seen each other’s effects, your transactions exhibit cyclic information flow.

To understand why a predicate need not match a version in order for that version to be considered a dependency, imagine an eager student at the library, who performs a transaction to look at every book about kiwis in the stacks. Now imagine that just prior, a professor had checked out “Kiwi: A Natural History”; it is no longer on the shelf. The student’s transaction predicate write-read depends on the professor’s, because “Kiwi: A Natural History” would have matched, but no longer appears. In a sense, the student observes the professor’s check-out transaction by failing to see a specific book.

Read-Write

A read-write dependency (also known as a rw or anti-dependency) is the converse of a write-read dependency. It captures the notion of a transaction T_i observing some state, then T_j overwriting that state. Specifically, T_j read-write depends on T_i if either:

T_i reads some version x_i, and T_j installs the next version of x in the version order. This is an item read-write dependency.
T_i performs a predicate read for which the system selects some version x_i, and T_j installs some later version of x which changes whether the predicate matches. This is a predicate read-write dependency.

For example, let’s say you’d like to bake some brownies. You begin by checking the cupboards for all the ingredients, ensuring that you have enough butter, chocolate, eggs, and so on. Then your friend waltzes into the kitchen, pops open the cupboard for a snack, and begins munching on the chocolate. Their transaction read-write depends on yours: they altered the state of the chocolate you observed.

Next, imagine you go to add the chocolate to your batter. You look once more at the container, and realize that there is no longer enough! You now write-read depend on their snacking transaction.⁷ This is an example of G-single. It is also a fractured read, since at different points in the baking process you both observed and did not observe another transaction’s effects.

Process

We model a distributed system as a collection of single-threaded processes. Each process does one operation at a time. We might want to ensure that when a process performs an operation, it takes place logically “after” the previous operation that process executed. A process dependency captures this relationship. Specifically, transaction T_j process depends on transaction T_i if both transactions are executed by the same process, and the process executed T_i before T_j.

This is the same order used in Sequential consistency. Like Sequential, it does not capture real-time ordering. One process may lag arbitrarily far behind another.

Adding process edges to a graph with (e.g.) write-write, write-read, and read-write dependencies allows us to find cycles where (for example) a single process fails to observe its own previous writes.

If we identify a process as a session, adding process edges to the dependency graph used for Serializability gives Strong Session Serializable. Adding process edges to Snapshot Isolation gives Strong Session Snapshot Isolation, and so on.

Real-time

A real-time dependency means that one transaction completed before another in real time—that is to say, as measured by imaginary, perfectly synchronized wall clocks.⁸ We say that transaction T_j real-time depends on transaction T_i if T_i completes before T_j begins. Note that if T_i and T_j are concurrent, no real-time dependency exists.

This is the same temporal order used by Linearizability. It (conservatively) captures the idea that information could have flowed from T_i to T_j, via a message sent the instant T_i was known to have committed (or aborted). Real-time dependencies let us describe phenomena like stale read, where one process completes a write, then at a later time, a second process begins a read which fails to observe that write.

Adding real-time dependencies to the graph for Serializability gives Strong Serializable. Adding them to Snapshot Isolation gives Strong Snapshot Isolation, and so on.

Since processes are single-threaded, real-time order implies process order—every process dependency is also a real-time dependency.

For a peek at the history of dependency-graph formalizations of consistency, see Gray, Lorie, Putzolu, & Traiger’s 1977 paper Granularity of Locks and Degrees of Consistency in a Shared Data Base.
↩
We follow Fekete et al. in calling dependencies and anti-dependencies “dependencies”, and in naming them write-write, write-read, and read-write, respectively. We use “operation” and “transaction” synonymously, to use the same language we use for non-transactional models. Our transactions are therefore composed of “micro-operations”. Our writes are general transformations of data, rather than blind register writes; this helps build intuition both for version orders and for understanding many Jepsen test results. We extend our graphs with a process order, like Adya’s real-time order, to capture session variants of consistency models.
↩
For an alternative take on predicates, see Fekete et al., 2005, Making Snapshot Isolation Serializable. Their model allows predicate operations to use the entire database state, rather than filtering versions separately. They also omit predicate writes, modeling them as a predicate read followed by item writes.
↩
This order is constrained in three ways. First, it is consistent with the order of micro-operations. Second, it ensures no operation reads a version before it is written. Third, it ensures reads observe the most recent write. For more on the event order, see Adya’s thesis, chapter three.
↩
In Adya’s formalism, DSG(H) is purely write-write, write-read, and read-write dependencies; it does not include real-time or process orders.
↩
This example demonstrates why Berenson et al.’s dirty write is overly broad: not all dirty writes violate serializability! This is why Adya’s write cycle requires a loop of write-write dependencies.
↩
Trasnacktion.
↩
We assume an inertial reference frame, a minimum latency of zero, and all processes at rest. If you are designing a consistency model for relativistic processes, you may wish to be somewhat more conservative and define a real-time order in terms of light cones.
↩