Friday 20 December 2013

Types of Reads and Levels of Transaction Isolation in PostgreSQL

I see many people asking differences between different types of reads and transaction isolation levels of PostgreSQL. Here you go:

Q1: Types of Reads:
=============
1. dirty reads: 

If one transaction were able to see the uncommitted work of another transaction, that would be a dirty read. That would generally be a very bad thing, since that work might never be committed.

2. non-repeatable reads

If a transaction were able to read a row, and then later read the same row and see different values based on the work of another transaction, that would be a non-repeatable read.

3. phantom reads

If a transaction were able to read a set of rows matching some logical condition, and later read that same logical set of rows and see new ones which qualify because of the work of another transaction, that would be a phantom read. A transaction always sees its own work.

Q2: Levels of Transaction Isolation:
======================
1. TRANSACTION_NONE

This is not a real transaction isolation level. This is defined in the JDBC specification specifically to deal with drivers which don't support transaction isolation levels. It does not match to any transaction isolation level in the SQL standard, and should not be used with a driver which does support transaction isolation levels.

The following isolation levels do correspond to transaction isolation levels described in the SQL standard. Three of these levels are defined by which of the phenomena from Q1 are allowed; the fourth is not defined by the above phenomena, although the standard notes that due to how it is defined, none of the above phenomena are possible. It is also worth noting that the standard does not require any of these phenomena to be allowed at any level; an implementation is conforming to the standard if it is more strict, but not if it is less strict. Among other things, this means that it is OK for a transaction isolation level's implementation to be the same as a more strict transaction isolation level's implementation.

Because PostgreSQL uses Multi-Version Concurrency Control (MVCC) for managing concurrency, it takes “snapshots” as a key component of how it implements these transaction isolation levels. A snapshot controls which other transactions' work will be visible. The work of transactions which have committed at the moment a snapshot is taken will be visible. The work of any transactions which have rolled back will not be visible, nor will the work of any transactions which are active or which have not yet begun when the snapshot was taken.

2. TRANSACTION_READ_COMMITTED

In the SQL standard, this transaction isolation level must not allow dirty reads, but non-repeatable reads and phantom reads are allowed. In PostgreSQL dirty reads are not possible and non-repeatable and phantom reads can, in fact, happen.

PostgreSQL takes a snapshot at the start of each statement, after it has parsed that statement and analyzed it sufficiently to determine that a snapshot is in fact required. This provides the statement with a stable view of the database for the duration of that statement, with one exception: if an UPDATE or DELETE statement finds that it is about to modify a row which a concurrent, uncommitted transaction has already updated or deleted, it blocks and waits for the other transaction to complete. If that other transaction rolls back rather than committing, the UPDATE or DELETE can proceed normally. If the other transaction commits, PostgreSQL will look for the newest version of the row created by updates; if the row has not been deleted and if it still matches the selection criteria for the query, that new version is updated or deleted. Because the statement is no longer working with a single snapshot of the database after blocking for a concurrent transaction's work, there are rare conditions under which this can produce surprising results.

3. TRANSACTION_READ_UNCOMMITTED

In the SQL standard, any of the phenomena described in Q1 are allowed, including dirty reads. In PostgreSQL this is implemented the same as TRANSACTION_READ_COMMITTED; dirty reads are not, in fact, allowed. This is permitted by the SQL standard.

4. TRANSACTION_REPEATABLE_READ

In the SQL standard, this transaction isolation level must not allow dirty reads or non-repeatable reads, but phantom reads are allowed. In PostgreSQL none of these phenomena can, in fact, happen.

PostgreSQL takes a snapshot the first time within such a transaction that it finds that it needs one. This does not happen with the statement which starts the transaction, nor for many of the utility statements such as LOCK TABLE or SET READ ONLY which might be executed near the beginning of the transaction. This is by design, and is an important point to understand when using explicit locking; if you acquire a lock after the snapshot is taken, you might block because of a concurrent update to a row, but if you read that row after the blocking transaction commits, you will see the value according to the snapshot – that is, the old version before the concurrent update. Explicit locks should normally be taken at the start of the transaction, before a snapshot has been acquired, to prevent this.

Once a repeatable read transaction has acquired a snapshot, it uses that for the remainder of the transaction. This technique is called “snapshot isolation” and provides more guarantees than required for repeatable read, but falls short of the requirements of the standard for serializable transactions. It is worth noting that versions of PostgreSQL prior to 9.1 used this transaction isolation level when a serializable transaction was requested. At least one other database product still does this, although it allows certain well-known serialization anomalies to occur.

5. TRANSACTION_SERIALIZABLE

In the SQL standard, there is a requirement that any set of concurrent serializable transactions produces an effect which is consistent with some serial (one-at-a-time) execution of those transactions. The standard notes that due to this definition, none of the phenomena described in Q1 will be possible. Although this definition of the serializable transaction isolation level has been present in every version of the SQL standard for almost 20 years, confusion persists – one can still find mistaken assertions that avoiding the three phenomena mentioned in Q1 is not only necessary but also sufficient to comply with the standard.

In version 9.1 and later PostgreSQL uses a technique called Serializable Snapshot Isolation (often abbreviated as SSI), to implement the serializable transaction isolation level. Rather than using blocking to implement the desired transaction isolation, it allows snapshot isolation (described above for repeatable read transactions) to operate, while monitoring for the conditions under which a serialization anomaly could cause problems. If such a situation occurs, a transaction will be rolled back with a serialization failure (SQLSTATE 40001) and should be retried from the beginning.

In versions prior to 9.1 there was no difference between serializable and repeatable read transactions; the behavior provided by both transaction isolation levels in earlier releases is still available as the repeatable read transaction isolation level in version 9.1 and later.

No comments:

Post a Comment