JEPSEN

Snapshot Isolation

In a snapshot isolated system, each transaction appears to operate on an independent, consistent snapshot of the database. Its changes are visible only to that transaction until commit time, when all changes become visible atomically to any transaction which begins at a later time. If transaction T1 has modified an object x, and another transaction T2 committed a write to x after T1’s snapshot began, and before T1’s commit, then T1 must abort.

Snapshot isolation is a transactional model: operations (usually termed “transactions”) can involve several primitive sub-operations performed in order. It is also a multi-object property: operations can act on multiple objects in the system.

Snapshot isolation cannot be totally available; in the presence of network partitions, some or all nodes may be unable to make progress. For total availability, at the cost of allowing long-fork anomalies, consider parallel snapshot isolation, or (weaker, but more commonly supported) read committed.

Unlike serializability, which enforces a total order of transactions, snapshot isolation only forces a partial order: sub-operations in one transaction may interleave with those from other transactions. The most notable phenomena allowed by snapshot isolation are write skews, which allow transactions to read overlapping state, modify disjoint sets of objects, then commit; and a read-only transaction anomaly, involving partially disjoint write sets.

Snapshot isolation implies read committed. However, it does not impose any real-time constraints. If process A completes write w, then process B begins a read r, r is not necessarily guaranteed to observe w. Some databases provide real-time variants of snapshot isolation. Compare with strict serializability, which provides a total order and real-time guarantees.

Moreover, snapshot isolation does not require a per-process order between transactions. A process can observe a write, then fail to observe that same write in a subsequent transaction. In fact, a process can fail to observe its own prior writes, if those writes occurred in different transactions. To enforce that transactions from the same process appear to execute in order, consider prefix-consistent snapshot isolation.

In its generalized form, snapshot isolation allows pathological orderings. For instance, a snapshot isolated database can always return the empty state for any reads, by appearing to execute those reads at time 0. It can also discard write-only transactions by reordering them to execute at the very end of the history, after any reads. Operations like increments can also be discarded, assuming the result of the increment is never observed. Luckily, most implementations don’t seem to take advantage of these optimization opportunities.

A Range of Snapshot Isolations

Papers vary in how strictly they constrain snapshot isolation’s allowed histories. The original paper by Berenson et al is somewhat ambiguous about the relationship between commit and start timestamps and wall-clock time, and subsequent work by Daudjee & Salem established “classic/strong” and “generalized/weak” SI levels. Adya’s formalism for SI assumes that we take as given specific choices made by the scheduler: whether a history is SI or not depends on what the scheduler does. This gives rise to a range of possible snapshot isolations depending on how the scheduler is implemented.

If one interprets SI as enforcing that transactions always choose a start time higher than every committed transaction’s commit time, then snapshot isolation is clearly incomparable to serializability: it forces a real-time order which serializability does not.

Following extensive conversation with Bailis, Crooks, Fekete, Hellerstein, Sutra, and Shapiro, Jepsen interprets snapshot isolation in a more relaxed way: we take SI to prohibit dependency cycles between transactions which contain more than one adjacent read-write anti-dependency edge. Under this generalized definition, snapshot isolation is strictly weaker than serializability. This definition is compatible with a broader set of implementations.

Formally

Berenson, Bernstein, et al first defined snapshot isolation in terms of an abstract algorithm:

… each transaction reads reads data from a snapshot of the (committed) data as of the time the transaction started, called its Start-Timestamp. This time may be any time before the transaction’s first Read. A transaction running in Snapshot Isolation is never blocked attempting a read as long as the snapshot data from its Start-Timestamp can be maintained. The transaction’s writes (updates, inserts, and deletes) will also be reflected in this snapshot, to be read again if the transaction accesses (i.e., reads or updates) the data a second time. Updates by other transactions active after the transaction Start-Timestamp are invisible to the transaction.

When the transaction T1 is ready to commit, it gets a Commit-Timestamp, which is larger than any existing Start-Timestamp or Commit-Timestamp. The transaction successfully commits only if no other transaction T2 with a Commit-Timestamp in T1’s execution interval [StartTimestamp, Commit-Timestamp] wrote data that T1 also wrote. Otherwise, T1 will abort. This feature, called First-committer-wins prevents lost updates (phenomenon P4). When T1 commits, its changes become visible to all transactions whose Start-Timestamps are larger than T1‘s Commit-Timestamp.

To obtain per-process orders, or real-time orders, we can add constraints on start and commit timestamps.

For a reasonably intuitive formalization based on abstract executions, see Cerone, Bernardi, & Gotsman’s A Framework for Transactional Consistency Models with Atomic Visibility, which specifies snapshot isolation as a combination of four properties:

  • Internal consistency: within a transaction, reads observe that transaction’s most recent writes (if any)
  • External consistency: reads without a preceding write in transaction T1 must observe the state written by a transaction T0, such that T0 is visible to T1, and no more recent transaction wrote to that object.
  • Prefix: transactions become visible to all nodes in the same order
  • NoConflict: if two transactions write to the same object, one must be visible to the other.

For a state-based formalization, see Crooks, Pu, Alvisi, & Clement: Seeing is Believing: A Client-Centric Specification of Database Isolation.