JEPSEN

P3 (Phantom)

Phenomenon P3, also known as Phantom or Fuzzy Read, occurs when one transaction changes data involved in a predicate read performed by another, ongoing transaction. The definition from Berenson et al is essentially this: transaction T_i reads predicate P, then transaction T_j writes some object x in P. Then T_i and T_j commit or abort in any order.

This definition hinges on what “x in P” means, and Berenson et al are somewhat vague on this point. Informally, we wish to ensure that whatever is observed by a predicate read is stable for the duration of the transaction. As usual, we model inserts and deletions as writes, and say that every predicate read implicitly selects a single version for every object in the database: VSet(P). Assume VSet(P) contains x_i, and T_j writes x_j. Clearly, if both versions of x match P, this could lead to a non-repeatable predicate read, or other invariant violations. However, we also want to handle cases where x_i does not match P (say, because it is unborn), and x_j matches P: a second read of P would see x popping into existence. Or the inverse: x_i matches P, but x_j does not. We want “x in P” to cover these cases.

Jepsen suggests the following definition: P3 occurs when transaction T_i performs a read of predicate P such that VSet(P) includes some version x_i. Then transaction T_j writes version x_j. At least one of x_i or x_j match P. Both T_i and T_j may commit or abort.

P3 is defined in terms of a total order of events, and makes implicit assumptions about time and state that may not hold true. For instance, if T_i committed before T_j began, it would be odd if T_j performed a pair of predicate reads which did not, then did, reflect T_i. This is clearly not a repeatable predicate read, but because T_i is not concurrent with T_j, it does not meet this definition of P3. For this reason, Jepsen prefers G2.

Literature

Phantom comes from the ANSI SQL standard, which says:

P3 (“Phantom”): SQL-transaction T1 reads the set of rows N that satisfy some <search condition>. SQL-transaction T2 then executes SQL-statements that generate one or more rows that satisfy the <search condition> used by SQL-transaction T1. If SQL-transaction T1 then repeats the initial read with the same <search condition>, it obtains a different collection of rows.

This description is famously bad. Is the final sentence meant to be an example, or required? What about deletes? What about a write that causes a row to no longer match the search condition? In 1995 Berenson et al summarized the problems with ANSI SQL’s definitions and offered two possible interpretations of Phantom. The strict interpretation, A3, requires two reads of P, and that both transactions commit. The broad interpretation, P3, requires only one read, and covers any combination of commits and aborts.

P3: r1[P] … w2[y in P] … ((c1 or a1) and (c2 or a2) in any order)

A3: r1[P] … w2[y in P] … c2 … r1[P] … c1

Both A3 and P3 prevent a specific pattern that leads to a transaction observing different values for a repeated predicate read. However, A3 fails to prevent many predicate-related anomalies. Imagine T_i withdraws $10 from account x, T_j performs a predicate read of all accounts, and T_i deposits $10 to account y. The predicate read observes a state of the world in which the sum of all accounts is negative, rather than zero. It sees part, but not all, of T_i’s effects. P3, by contrast, prevents this.

As of 2024 the ANSI standard’s definition of isolation levels is unchanged. Whether or Phantom includes deletions remains unspecified. Berenson et al argue P3 is the correct interpretation, and Jepsen concurs.

JEPSEN

P3 (Phantom)

Literature

See Also