JEPSEN

P1 (Dirty Read)

Phenomenon P1, or Dirty Read, occurs when one transaction reads data from another in-progress transaction. Specifically, transaction Ti writes some version xi, and a different transaction Tj reads xi before Ti completes. Both Ti and Tj may commit or abort.

P1 is defined in terms of a total order of events, and makes implicit assumptions about time and state that may not hold true. For instance, if Ti aborts and then Tj begins, it is clearly bad if Tj reads xi. Real world systems do this all the time—for instance, if Tj reads from a cache or different replica. However, this is not P1! For this reason, Jepsen prefers G1a, G1b, and G1c.

Literature

Dirty Read comes from the ANSI SQL standard, which says:

P1 (“Dirty read”): SQL-transaction T1 modifies a row. SQL-transaction T2 then reads that row before T1 performs a COMMIT. If T1 then performs a ROLLBACK, T2 will have read a row that was never committed and that may thus be considered to have never existed.

This description is famously ambiguous: is the final sentence meant to be an example, or required? In 1995 Berenson et al summarized the problems with ANSI SQL’s definitions and offered two possible interpretations of Dirty Read. The strict interpretation, A1, requires that writer abort and the reader commit. The broad interpretation, P1, includes any combination of commits and aborts.

  • P1: w1[x] … r2[x] … ((c1 or a1) and (c2 or a2) in any order)
  • A1: w1[x] … r2[x] … (a1 and c2 in any order)

Both A1 and P1 prevent aborted read, but P1 goes further. For example, it prevents G1b (Intermediate Read), where Tj reads a non-final version written by some committed Ti. Likewise, P1 prevents some types of G1c (Circular Information Flow) which A1 allows, since concurrent transactions can never observe each other’s writes.

As of 2024 the ANSI standard’s definition of isolation levels is unchanged. Whether or not ANSI Read Committed prevents G1b and G1c is still unclear. In general, Jepsen prefers Berenson’s P1.

See Also

Adya’s G1a, G1b, and G1c capture “what we think the standard meant to say” in three more general, precise phenomena. The Adya formalism relies on dataflow, rather than assuming a single database state with real-time order. P1 is also arguably broader than necessary: it still occurs even if both writer and reader abort. G1, by contrast, occurs only if the reader commits.