JEPSEN

Serializability

Informally, serializability means that transactions appear to have occurred in some total order.

Serializability is a transactional model: operations (usually termed “transactions”) can involve several primitive sub-operations performed in order. Serializability guarantees that operations take place atomically: a transaction’s sub-operations do not appear to interleave with sub-operations from other transactions.

It is also a multi-object property: operations can act on multiple objects in the system. Indeed, serializability applies not only to the particular objects involved in a transaction, but to the system as a whole—operations may act on predicates, like “the set of all cats”.

Serializability cannot be totally or sticky available; in the event of a network partition, some or all nodes will be unable to make progress.

Serializability implies repeatable read, snapshot isolation, etc. However, it does not impose any real-time, or even per-process constraints. If process A completes write w, then process B begins a read r, r is not necessarily guaranteed to observe w. For those kinds of real-time guarantees, see strict serializable.

Moreover, serializability does not require a per-process order between transactions. A process can observe a write, then fail to observe that same write in a subsequent transaction. In fact, a process can fail to observe its own prior writes, if those writes occurred in different transactions.

The requirement for a total order of transactions is strong—but still allows pathological orderings. For instance, a serializable database can always return the empty state for any reads, by appearing to execute those reads at time 0. It can also discard write-only transactions by reordering them to execute at the very end of the history, after any reads. Operations like increments can also be discarded, assuming the result of the increment is never observed. Luckily, most implementations don’t seem to take advantage of these optimization opportunities.

Formally

The ANSI SQL 1999 spec says:

The execution of concurrent SQL-transactions at isolation level SERIALIZABLE is guaranteed to be serializable. A serializable execution is defined to be an execution of the operations of concurrently executing SQL-transactions that produces the same effect as some serial execution of those same SQL-transactions. A serial execution is one in which each SQL-transaction executes to completion before the next SQL-transaction begins.

… and goes on to define its isolation levels in terms of proscribed anomalies: serializable is read committed, but without phenomenon P3:

P3 (“Phantom”): SQL-transaction T1 reads the set of rows N that satisfy some <search condition>. SQL-transaction T2 then executes SQL-statements that generate one or more rows that satisfy the <search condition> used by SQL-transaction T1. If SQL-transaction T1 then repeats the initial read with the same <search condition>, it obtains a different collection of rows.

However, as Berenson, Bernstein, et al observed, the ANSI specification allows multiple intepretations, and one of those interpretations (the “anomaly interpretation”) admits nonserializable histories.

Adya’s formalization of transactional isolation levels provides a more thorough summary of the preventative interpretation of the ANSI levels, defining serializability as the absence of four phenomena. Serializability prohibits:

  • P0 (Dirty Write): w1(x) … w2(x)
  • P1 (Dirty Read): w1(x) … r2(x)
  • P2 (Fuzzy Read): r1(x) … w2(x)
  • P3 (Phantom): r1(P) … w2(y in P)

Here w denotes a write, r denotes a read, and subscripts indicate the transaction which executed that operation. The notation “…” indicates a series of micro-operations except for a commit or abort. P indicates a predicate.

As Adya notes, the preventative interpretation of the ANSI specification is overly restrictive: it rules out some histories which are legally serializable.

For a reasonably intuitive formalization based on abstract executions, see Cerone, Bernardi, & Gotsman’s A Framework for Transactional Consistency Models with Atomic Visibility, which specifies serializability as a combination of three properties:

  • Internal consistency: within a transaction, reads observe that transaction’s most recent writes (if any)
  • External consistency: reads without a preceding write in transaction T1 must observe the state written by a transaction T0, such that T0 is visible to T1, and no more recent transaction wrote to that object.
  • Total visibility: the visibility relationship must be a total order.

For a state-based formalization, see Crooks, Pu, Alvisi, & Clement: Seeing is Believing: A Client-Centric Specification of Database Isolation.