Friday, 20 December 2013

Types of Reads and Levels of Transaction Isolation in PostgreSQL

I see many people asking differences between different types of reads and transaction isolation levels of PostgreSQL. Here you go:

Q1: Types of Reads:
1. dirty reads: 

If one transaction were able to see the uncommitted work of another transaction, that would be a dirty read. That would generally be a very bad thing, since that work might never be committed.

2. non-repeatable reads

If a transaction were able to read a row, and then later read the same row and see different values based on the work of another transaction, that would be a non-repeatable read.

3. phantom reads

If a transaction were able to read a set of rows matching some logical condition, and later read that same logical set of rows and see new ones which qualify because of the work of another transaction, that would be a phantom read. A transaction always sees its own work.

Q2: Levels of Transaction Isolation:

This is not a real transaction isolation level. This is defined in the JDBC specification specifically to deal with drivers which don't support transaction isolation levels. It does not match to any transaction isolation level in the SQL standard, and should not be used with a driver which does support transaction isolation levels.

The following isolation levels do correspond to transaction isolation levels described in the SQL standard. Three of these levels are defined by which of the phenomena from Q1 are allowed; the fourth is not defined by the above phenomena, although the standard notes that due to how it is defined, none of the above phenomena are possible. It is also worth noting that the standard does not require any of these phenomena to be allowed at any level; an implementation is conforming to the standard if it is more strict, but not if it is less strict. Among other things, this means that it is OK for a transaction isolation level's implementation to be the same as a more strict transaction isolation level's implementation.

Because PostgreSQL uses Multi-Version Concurrency Control (MVCC) for managing concurrency, it takes “snapshots” as a key component of how it implements these transaction isolation levels. A snapshot controls which other transactions' work will be visible. The work of transactions which have committed at the moment a snapshot is taken will be visible. The work of any transactions which have rolled back will not be visible, nor will the work of any transactions which are active or which have not yet begun when the snapshot was taken.


In the SQL standard, this transaction isolation level must not allow dirty reads, but non-repeatable reads and phantom reads are allowed. In PostgreSQL dirty reads are not possible and non-repeatable and phantom reads can, in fact, happen.

PostgreSQL takes a snapshot at the start of each statement, after it has parsed that statement and analyzed it sufficiently to determine that a snapshot is in fact required. This provides the statement with a stable view of the database for the duration of that statement, with one exception: if an UPDATE or DELETE statement finds that it is about to modify a row which a concurrent, uncommitted transaction has already updated or deleted, it blocks and waits for the other transaction to complete. If that other transaction rolls back rather than committing, the UPDATE or DELETE can proceed normally. If the other transaction commits, PostgreSQL will look for the newest version of the row created by updates; if the row has not been deleted and if it still matches the selection criteria for the query, that new version is updated or deleted. Because the statement is no longer working with a single snapshot of the database after blocking for a concurrent transaction's work, there are rare conditions under which this can produce surprising results.


In the SQL standard, any of the phenomena described in Q1 are allowed, including dirty reads. In PostgreSQL this is implemented the same as TRANSACTION_READ_COMMITTED; dirty reads are not, in fact, allowed. This is permitted by the SQL standard.


In the SQL standard, this transaction isolation level must not allow dirty reads or non-repeatable reads, but phantom reads are allowed. In PostgreSQL none of these phenomena can, in fact, happen.

PostgreSQL takes a snapshot the first time within such a transaction that it finds that it needs one. This does not happen with the statement which starts the transaction, nor for many of the utility statements such as LOCK TABLE or SET READ ONLY which might be executed near the beginning of the transaction. This is by design, and is an important point to understand when using explicit locking; if you acquire a lock after the snapshot is taken, you might block because of a concurrent update to a row, but if you read that row after the blocking transaction commits, you will see the value according to the snapshot – that is, the old version before the concurrent update. Explicit locks should normally be taken at the start of the transaction, before a snapshot has been acquired, to prevent this.

Once a repeatable read transaction has acquired a snapshot, it uses that for the remainder of the transaction. This technique is called “snapshot isolation” and provides more guarantees than required for repeatable read, but falls short of the requirements of the standard for serializable transactions. It is worth noting that versions of PostgreSQL prior to 9.1 used this transaction isolation level when a serializable transaction was requested. At least one other database product still does this, although it allows certain well-known serialization anomalies to occur.


In the SQL standard, there is a requirement that any set of concurrent serializable transactions produces an effect which is consistent with some serial (one-at-a-time) execution of those transactions. The standard notes that due to this definition, none of the phenomena described in Q1 will be possible. Although this definition of the serializable transaction isolation level has been present in every version of the SQL standard for almost 20 years, confusion persists – one can still find mistaken assertions that avoiding the three phenomena mentioned in Q1 is not only necessary but also sufficient to comply with the standard.

In version 9.1 and later PostgreSQL uses a technique called Serializable Snapshot Isolation (often abbreviated as SSI), to implement the serializable transaction isolation level. Rather than using blocking to implement the desired transaction isolation, it allows snapshot isolation (described above for repeatable read transactions) to operate, while monitoring for the conditions under which a serialization anomaly could cause problems. If such a situation occurs, a transaction will be rolled back with a serialization failure (SQLSTATE 40001) and should be retried from the beginning.

In versions prior to 9.1 there was no difference between serializable and repeatable read transactions; the behavior provided by both transaction isolation levels in earlier releases is still available as the repeatable read transaction isolation level in version 9.1 and later.

Saturday, 14 December 2013

Physical Machine Vs Virtual Machine

1)Databases on (VM's) guest operating systems by design when they start they grab blocks of a resource and manage it directly for performance reasons. As soon as you make the core operating system of the database server a guest in virtualized hosting environment then you are placing an arbitration layer with the hypervisor between the block allocated element of disk and RAM and the database server. It will slow down. The more inefficient your queries, the more it will slow.To be clear, that a finely tuned, massively busy, high performance database server should have its own physical hardware.

2) Not Recommended to go with a virtualized deployment because of the I/O penalty in VM its hard to know exactly what the performance penalty is but there is one. It is intrinsically more complex and harder to trace performance problems and does not allow disk I/O bandwidth to be allocated on a per-virtual server basis.

3)There is no doubt that virtualization requires extra hardware resources. The problem is that it is almost impossible to estimate in advance how many extra resources will be needed. We know that there are capacity planning guides and tools but from our experience every piece of software behaves differently in a virtualized environment. We have applications that are quite modest as long as they run on a physical server, but when they were virtualized their resource requirement multiplied.

4)If They are running multiple servers on the same host, the IO situation gets worse: it becomes even more important to carefully manage how many  Servers end up on a single physical host, and more difficult to balance the IO requirements of each server.

5)Unit of DB performance per unit of Hardware is a bit lower for a virtualized db. This means it requires more hardware to get the same level of performance compare to physical server.A common resource problem to look out for is adding additional VM's and thinning out the available resources or allocation of existing resources among virtual servers.And when it shows a sign of getting affected by OS resources it does not remain a recommended option to host large production database in virtual environment.

6)Level of interaction would increase the cost of processes on the virtual server due to its extra layer which could be a potential threat for CPU consumption and writing latency.

Tuesday, 10 December 2013

Step By Step Guide to setup Steaming Replication.

Though many know about streaming replication, I would like to start from Prerequisites and with some Introduction about this replication as this blog is for beginners :-).


1. Same version of the PostgreSQL Database must be installed on both servers. 

2. Configure the Password-less ssh authentication to use the “postgres” user. 

3. Production server must operate in WAL archiving enabled mode by setting archive_mode and archive_command in postgresql.conf file.

4. Production and stand by servers always should have connectivity between them to transfer the archived WAL files from production. 

5. Set up your standby host's environment and directory structure exactly the same as your primary. 


It is an asynchronous mechanism; the standby server lags behind the master. But unlike other replication methods, this lag is very short, and can be as little as a single transaction, depending on network speed, database activity, and Streaming replication settings. Also, the load on the master for each slave is minimal, allowing a single master to support dozens of slaves. 

The feature is included in Postgresql-9.0, with this the second database instance (normally on a separate server) replaying the primary's binary log, while making that standby server can accept read-only queries.

Here are the practical steps with necessary commands:

1. Connect to Master and create a "replication" user with replication privilege.

$ psql
Password for user postgres:
psql.bin (9.2.1) Type "help" for help.
postgres=# create user replication with replication password '<password>';
2. We need to change some parameters in postgresql.conf and authentication in pg_hba.conf which are located at  /opt/PostgreSQL92/data/ location on Master. Set up connections and authentication so that the standby server can successfully connect to the replication pseudo-database on the primary.
 $ $EDITOR postgresql.conf

listen_addresses = '*'

$ $EDITOR pg_hba.conf

#The standby server must have superuser access privileges.
host replication replication md5
3. Set up the streaming replication related parameters on the primary server.
 $EDITOR postgresql.conf
#To enable read­only queries on a standby server, wal_level must be set to "hot_standby". But you can choose "archive" if you never connect to the server in standby mode.

wal_level = hot_standby

#Set the maximum number of concurrent connections from the standby servers.

max_wal_senders = 5 

#To prevent the primary server from removing the WAL segments required for the standby server before shipping them, set the minimum number of segments retained in the pg_xlog directory. 

wal_keep_segments = 32

#Enable WAL archiving on the primary to an archive directory accessible from the standby. If wal_keep_segments is a high enough number to retain the WAL segments required for the standby server, this may not be necessary.

archive_mode    = on 

archive_command = 'cp %p <archive location>%f && scp %p postgres@<archive location>/%f'

Note: Restart the cluster after modifying the above parameters in postgresql.conf file.
4. Re-start postgres on the primary server and check if the parameters are affected.
postgres=# show archive_command ;
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­      -----------------
 cp %p /var/PG92_Archives/%f
(1 row)
postgres=# show archive_mode ;
(1 row)
postgres=# show wal_level ;
(1 row)
postgres=# show max_wal_senders ;
(1 row)
postgres=# show wal_keep_segments ;
5. Make a base backup of Master server's data directory.
$ psql ­c "SELECT pg_start_backup('label', true)" 
$ cp /opt/PostgreSQL92/data/* backup/ 
$psql ­c "SELECT pg_stop_backup()" 
-- tar the backup directory and move to standby location. 
$tar ­cvzf backup.tar backup/

$scp backup.tar postgres@ 
6. Move the slave data directory contents to any other location, untar the backup file and copy contents to slave data directory.
7. Set up replication-related parameters, connections and authentication in the standby server like the primary, so that the standby might work as a primary after failover. 8. Enable read-only queries on the standby server. But if wal_level is archive on the primary, leave hot_standby unchanged (i.e., off).
$ $EDITOR postgresql.conf
hot_standby = on
9. Create a recovery command file in the standby server; the following parameters are required for streaming replication.
$ $EDITOR recovery.conf

# Specifies whether to start the server as a standby. In streaming
# replication, this parameter must to be set to on.
standby_mode          = 'on'
# Specifies a connection string which is used for the standby server to
# connect with the primary.

primary_conninfo      = 'host= port=5432 user=replication
password=<password> application=<app_name>'

# Specifies a trigger file whose presence should cause streaming
# replication to end (i.e., failover).

trigger_file = '' ===> Do not create the file. You have to
create the file when failover.

# Specifies a command to load archive segments from the WAL archive. If
# # wal_keep_segments is a high enough number to retain the WAL segments
# # required for the standby server, this may not be necessary. But
# a large workload can cause segments to be recycled before the standby
# is fully synchronized, requiring you to start again from a new base
# backup.

restore_command = 'cp %f "%p"' 
10. Start postgres in the standby server. It will start streaming replication and you will see log messages like below:
LOG:  entering standby mode
LOG:  consistent recovery state reached at 0/1D000078
LOG:  record with zero length at 0/1D000078
LOG:  streaming replication successfully connected to primary 
11. You can calculate the replication lag by comparing the current WAL write location on the primary with the last WAL location received/replayed by the standby. They can be retrieved using pg_current_xlog_location function on the  primary side and the pg_last_xlog_receive_location or pg_last_xlog_replay_location function on the standby, respectively.
$ psql ­c "SELECT pg_current_xlog_location()" ­h192.168.0.10
(primary host)
(1 row)

$ psql ­c "select pg_last_xlog_receive_location()" ­h192.168.0.20
(standby host)
(1 row)
$ psql ­c "select pg_last_xlog_replay_location()" ­h192.168.0.20
(standby host)
(1 row)
12. Other ways to check streaming replication: the easiest way is "select now()-pg_last_xact_replay_timestamp();" at slave side. pg_last_xact_replay_timestamp() function gives time stamp of last transaction replayed during recovery, time at which the commit or abort WAL record for that transaction was generated on the primary. If no transactions have been replayed during recovery, this function returns NULL. Otherwise, if recovery is still in progress this will increase monotonically. If recovery has completed then this value will remain static at the value of the last transaction applied during that recovery. When the server has been started normally without recovery the function returns NULL. You can try with some operation on Master and then check the fuction output. 

If you want to check the delay manually, then go for below steps:

Step 1:Need to create table on Primary using below command.

create table stream_delay (tstamp timestamp without time zone );

insert into stream_delay select now();

Step 2: schedule the below command on primary to execute every minute on cronjob.

update stream_delay set tstamp='now()';

step 3: verify the delay on slave by selecting the "stream_delay" table.

It should show the last time that was updated in primary. Difference between this timestamp and current timestamp of slave server shows the time delay between Primary and slave.
You can also check the progress of streaming replication by using ps command.#The displayed LSNs indicate the byte position that the standby server has written up to in the xlogs.
[primary] $ ps ­ef | grep sender
postgres  6879  6831  0 10:31 ?        00:00:00 postgres: wal sender
process postgres streaming 0/2000000
[standby] $ ps ­ef | grep receiver
postgres  6878  6872  1 10:31 ? receiver process   streaming 0/2000000
Thanks all, Kindly let me know if I miss anything.