Friday, 20 December 2013

Types of Reads and Levels of Transaction Isolation in PostgreSQL

I see many people asking differences between different types of reads and transaction isolation levels of PostgreSQL. Here you go:

Q1: Types of Reads:
=============
1. dirty reads: 

If one transaction were able to see the uncommitted work of another transaction, that would be a dirty read. That would generally be a very bad thing, since that work might never be committed.

2. non-repeatable reads

If a transaction were able to read a row, and then later read the same row and see different values based on the work of another transaction, that would be a non-repeatable read.

3. phantom reads

If a transaction were able to read a set of rows matching some logical condition, and later read that same logical set of rows and see new ones which qualify because of the work of another transaction, that would be a phantom read. A transaction always sees its own work.

Q2: Levels of Transaction Isolation:
======================
1. TRANSACTION_NONE

This is not a real transaction isolation level. This is defined in the JDBC specification specifically to deal with drivers which don't support transaction isolation levels. It does not match to any transaction isolation level in the SQL standard, and should not be used with a driver which does support transaction isolation levels.

The following isolation levels do correspond to transaction isolation levels described in the SQL standard. Three of these levels are defined by which of the phenomena from Q1 are allowed; the fourth is not defined by the above phenomena, although the standard notes that due to how it is defined, none of the above phenomena are possible. It is also worth noting that the standard does not require any of these phenomena to be allowed at any level; an implementation is conforming to the standard if it is more strict, but not if it is less strict. Among other things, this means that it is OK for a transaction isolation level's implementation to be the same as a more strict transaction isolation level's implementation.

Because PostgreSQL uses Multi-Version Concurrency Control (MVCC) for managing concurrency, it takes “snapshots” as a key component of how it implements these transaction isolation levels. A snapshot controls which other transactions' work will be visible. The work of transactions which have committed at the moment a snapshot is taken will be visible. The work of any transactions which have rolled back will not be visible, nor will the work of any transactions which are active or which have not yet begun when the snapshot was taken.

2. TRANSACTION_READ_COMMITTED

In the SQL standard, this transaction isolation level must not allow dirty reads, but non-repeatable reads and phantom reads are allowed. In PostgreSQL dirty reads are not possible and non-repeatable and phantom reads can, in fact, happen.

PostgreSQL takes a snapshot at the start of each statement, after it has parsed that statement and analyzed it sufficiently to determine that a snapshot is in fact required. This provides the statement with a stable view of the database for the duration of that statement, with one exception: if an UPDATE or DELETE statement finds that it is about to modify a row which a concurrent, uncommitted transaction has already updated or deleted, it blocks and waits for the other transaction to complete. If that other transaction rolls back rather than committing, the UPDATE or DELETE can proceed normally. If the other transaction commits, PostgreSQL will look for the newest version of the row created by updates; if the row has not been deleted and if it still matches the selection criteria for the query, that new version is updated or deleted. Because the statement is no longer working with a single snapshot of the database after blocking for a concurrent transaction's work, there are rare conditions under which this can produce surprising results.

3. TRANSACTION_READ_UNCOMMITTED

In the SQL standard, any of the phenomena described in Q1 are allowed, including dirty reads. In PostgreSQL this is implemented the same as TRANSACTION_READ_COMMITTED; dirty reads are not, in fact, allowed. This is permitted by the SQL standard.

4. TRANSACTION_REPEATABLE_READ

In the SQL standard, this transaction isolation level must not allow dirty reads or non-repeatable reads, but phantom reads are allowed. In PostgreSQL none of these phenomena can, in fact, happen.

PostgreSQL takes a snapshot the first time within such a transaction that it finds that it needs one. This does not happen with the statement which starts the transaction, nor for many of the utility statements such as LOCK TABLE or SET READ ONLY which might be executed near the beginning of the transaction. This is by design, and is an important point to understand when using explicit locking; if you acquire a lock after the snapshot is taken, you might block because of a concurrent update to a row, but if you read that row after the blocking transaction commits, you will see the value according to the snapshot – that is, the old version before the concurrent update. Explicit locks should normally be taken at the start of the transaction, before a snapshot has been acquired, to prevent this.

Once a repeatable read transaction has acquired a snapshot, it uses that for the remainder of the transaction. This technique is called “snapshot isolation” and provides more guarantees than required for repeatable read, but falls short of the requirements of the standard for serializable transactions. It is worth noting that versions of PostgreSQL prior to 9.1 used this transaction isolation level when a serializable transaction was requested. At least one other database product still does this, although it allows certain well-known serialization anomalies to occur.

5. TRANSACTION_SERIALIZABLE

In the SQL standard, there is a requirement that any set of concurrent serializable transactions produces an effect which is consistent with some serial (one-at-a-time) execution of those transactions. The standard notes that due to this definition, none of the phenomena described in Q1 will be possible. Although this definition of the serializable transaction isolation level has been present in every version of the SQL standard for almost 20 years, confusion persists – one can still find mistaken assertions that avoiding the three phenomena mentioned in Q1 is not only necessary but also sufficient to comply with the standard.

In version 9.1 and later PostgreSQL uses a technique called Serializable Snapshot Isolation (often abbreviated as SSI), to implement the serializable transaction isolation level. Rather than using blocking to implement the desired transaction isolation, it allows snapshot isolation (described above for repeatable read transactions) to operate, while monitoring for the conditions under which a serialization anomaly could cause problems. If such a situation occurs, a transaction will be rolled back with a serialization failure (SQLSTATE 40001) and should be retried from the beginning.

In versions prior to 9.1 there was no difference between serializable and repeatable read transactions; the behavior provided by both transaction isolation levels in earlier releases is still available as the repeatable read transaction isolation level in version 9.1 and later.

Saturday, 14 December 2013

Physical Machine Vs Virtual Machine

1)Databases on (VM's) guest operating systems by design when they start they grab blocks of a resource and manage it directly for performance reasons. As soon as you make the core operating system of the database server a guest in virtualized hosting environment then you are placing an arbitration layer with the hypervisor between the block allocated element of disk and RAM and the database server. It will slow down. The more inefficient your queries, the more it will slow.To be clear, that a finely tuned, massively busy, high performance database server should have its own physical hardware.

2) Not Recommended to go with a virtualized deployment because of the I/O penalty in VM its hard to know exactly what the performance penalty is but there is one. It is intrinsically more complex and harder to trace performance problems and does not allow disk I/O bandwidth to be allocated on a per-virtual server basis.

3)There is no doubt that virtualization requires extra hardware resources. The problem is that it is almost impossible to estimate in advance how many extra resources will be needed. We know that there are capacity planning guides and tools but from our experience every piece of software behaves differently in a virtualized environment. We have applications that are quite modest as long as they run on a physical server, but when they were virtualized their resource requirement multiplied.

4)If They are running multiple servers on the same host, the IO situation gets worse: it becomes even more important to carefully manage how many  Servers end up on a single physical host, and more difficult to balance the IO requirements of each server.

5)Unit of DB performance per unit of Hardware is a bit lower for a virtualized db. This means it requires more hardware to get the same level of performance compare to physical server.A common resource problem to look out for is adding additional VM's and thinning out the available resources or allocation of existing resources among virtual servers.And when it shows a sign of getting affected by OS resources it does not remain a recommended option to host large production database in virtual environment.

6)Level of interaction would increase the cost of processes on the virtual server due to its extra layer which could be a potential threat for CPU consumption and writing latency.

Tuesday, 10 December 2013

Step By Step Guide to setup Steaming Replication.

Though many know about streaming replication, I would like to start from Prerequisites and with some Introduction about this replication as this blog is for beginners :-).

Prerequisites:

1. Same version of the PostgreSQL Database must be installed on both servers. 

2. Configure the Password-less ssh authentication to use the “postgres” user. 

3. Production server must operate in WAL archiving enabled mode by setting archive_mode and archive_command in postgresql.conf file.

4. Production and stand by servers always should have connectivity between them to transfer the archived WAL files from production. 

5. Set up your standby host's environment and directory structure exactly the same as your primary. 

Introduction:

It is an asynchronous mechanism; the standby server lags behind the master. But unlike other replication methods, this lag is very short, and can be as little as a single transaction, depending on network speed, database activity, and Streaming replication settings. Also, the load on the master for each slave is minimal, allowing a single master to support dozens of slaves. 

The feature is included in Postgresql-9.0, with this the second database instance (normally on a separate server) replaying the primary's binary log, while making that standby server can accept read-only queries.


Here are the practical steps with necessary commands:
---------------------------------------------------------------

1. Connect to Master and create a "replication" user with replication privilege.
Ex:

$ psql
Password for user postgres:
psql.bin (9.2.1) Type "help" for help.
postgres=# create user replication with replication password '<password>';
2. We need to change some parameters in postgresql.conf and authentication in pg_hba.conf which are located at  /opt/PostgreSQL92/data/ location on Master. Set up connections and authentication so that the standby server can successfully connect to the replication pseudo-database on the primary.
 $ $EDITOR postgresql.conf

listen_addresses = '*'

$ $EDITOR pg_hba.conf

#The standby server must have superuser access privileges.
host replication replication 10.176.0.0/16 md5
3. Set up the streaming replication related parameters on the primary server.
 $EDITOR postgresql.conf
#To enable read­only queries on a standby server, wal_level must be set to "hot_standby". But you can choose "archive" if you never connect to the server in standby mode.

wal_level = hot_standby

#Set the maximum number of concurrent connections from the standby servers.

max_wal_senders = 5 

#To prevent the primary server from removing the WAL segments required for the standby server before shipping them, set the minimum number of segments retained in the pg_xlog directory. 

wal_keep_segments = 32

#Enable WAL archiving on the primary to an archive directory accessible from the standby. If wal_keep_segments is a high enough number to retain the WAL segments required for the standby server, this may not be necessary.

archive_mode    = on 

archive_command = 'cp %p <archive location>%f && scp %p postgres@10.176.112.189:<archive location>/%f'

Note: Restart the cluster after modifying the above parameters in postgresql.conf file.
4. Re-start postgres on the primary server and check if the parameters are affected.
postgres=# show archive_command ;
       archive_command
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­      -----------------
 cp %p /var/PG92_Archives/%f
(1 row)
postgres=# show archive_mode ;
 archive_mode
­­­­­­­­­­­­­­-------------
 on
(1 row)
postgres=# show wal_level ;
  wal_level
­­­­­­­­­­­­­------------
 hot_standby
(1 row)
postgres=# show max_wal_senders ;
 max_wal_senders
­­­­­­­­­­­­­­­­­----------------
 5
(1 row)
postgres=# show wal_keep_segments ;
 wal_keep_segments
­­­­­­­­­­­­­­­­­­­------------------
 32
5. Make a base backup of Master server's data directory.
$ psql ­c "SELECT pg_start_backup('label', true)" 
$ cp /opt/PostgreSQL92/data/* backup/ 
$psql ­c "SELECT pg_stop_backup()" 
-- tar the backup directory and move to standby location. 
$tar ­cvzf backup.tar backup/

$scp backup.tar postgres@10.176.112.189:/opt/PostgreSQL92/ 
6. Move the slave data directory contents to any other location, untar the backup file and copy contents to slave data directory.
7. Set up replication-related parameters, connections and authentication in the standby server like the primary, so that the standby might work as a primary after failover. 8. Enable read-only queries on the standby server. But if wal_level is archive on the primary, leave hot_standby unchanged (i.e., off).
$ $EDITOR postgresql.conf
hot_standby = on
9. Create a recovery command file in the standby server; the following parameters are required for streaming replication.
$ $EDITOR recovery.conf

# Specifies whether to start the server as a standby. In streaming
# replication, this parameter must to be set to on.
standby_mode          = 'on'
# Specifies a connection string which is used for the standby server to
# connect with the primary.

primary_conninfo      = 'host=10.176.112.188 port=5432 user=replication
password=<password> application=<app_name>'

# Specifies a trigger file whose presence should cause streaming
# replication to end (i.e., failover).

trigger_file = '' ===> Do not create the file. You have to
create the file when failover.

# Specifies a command to load archive segments from the WAL archive. If
# # wal_keep_segments is a high enough number to retain the WAL segments
# # required for the standby server, this may not be necessary. But
# a large workload can cause segments to be recycled before the standby
# is fully synchronized, requiring you to start again from a new base
# backup.

restore_command = 'cp %f "%p"' 
10. Start postgres in the standby server. It will start streaming replication and you will see log messages like below:
LOG:  entering standby mode
LOG:  consistent recovery state reached at 0/1D000078
LOG:  record with zero length at 0/1D000078
LOG:  streaming replication successfully connected to primary 
11. You can calculate the replication lag by comparing the current WAL write location on the primary with the last WAL location received/replayed by the standby. They can be retrieved using pg_current_xlog_location function on the  primary side and the pg_last_xlog_receive_location or pg_last_xlog_replay_location function on the standby, respectively.
$ psql ­c "SELECT pg_current_xlog_location()" ­h192.168.0.10
(primary host)
 pg_current_xlog_location
­­­­­­­­­­­­­­­­­­­­­­­­­­
 0/2000000
(1 row)

$ psql ­c "select pg_last_xlog_receive_location()" ­h192.168.0.20
(standby host)
 pg_last_xlog_receive_location
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
 0/2000000
(1 row)
$ psql ­c "select pg_last_xlog_replay_location()" ­h192.168.0.20
(standby host)
 pg_last_xlog_replay_location
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
 0/2000000
(1 row)
12. Other ways to check streaming replication: the easiest way is "select now()-pg_last_xact_replay_timestamp();" at slave side. pg_last_xact_replay_timestamp() function gives time stamp of last transaction replayed during recovery, time at which the commit or abort WAL record for that transaction was generated on the primary. If no transactions have been replayed during recovery, this function returns NULL. Otherwise, if recovery is still in progress this will increase monotonically. If recovery has completed then this value will remain static at the value of the last transaction applied during that recovery. When the server has been started normally without recovery the function returns NULL. You can try with some operation on Master and then check the fuction output. 

If you want to check the delay manually, then go for below steps:

Step 1:Need to create table on Primary using below command.

create table stream_delay (tstamp timestamp without time zone );

insert into stream_delay select now();

Step 2: schedule the below command on primary to execute every minute on cronjob.

update stream_delay set tstamp='now()';

step 3: verify the delay on slave by selecting the "stream_delay" table.

It should show the last time that was updated in primary. Difference between this timestamp and current timestamp of slave server shows the time delay between Primary and slave.
You can also check the progress of streaming replication by using ps command.#The displayed LSNs indicate the byte position that the standby server has written up to in the xlogs.
[primary] $ ps ­ef | grep sender
postgres  6879  6831  0 10:31 ?        00:00:00 postgres: wal sender
process postgres 127.0.0.1(44663) streaming 0/2000000
[standby] $ ps ­ef | grep receiver
postgres  6878  6872  1 10:31 ? receiver process   streaming 0/2000000
Thanks all, Kindly let me know if I miss anything.

Saturday, 16 November 2013

Backup my PostgreSQL Database


There are three fundamentally different approaches to backing up PostgreSQL databases:

1. SQL dump
2. File system level backup
3. Continuous Archiving

1. SQL dump:
--------------
•  Generate a text file with SQL commands
•  PostgreSQL provides the utility program pg_dump for this purpose.

For more details please go through the below link:-
http://www.enterprisedb.com/docs/en/9.2/pg/app-pgdump.html

Let us see some examples here:

 --> Take the dump of whole cluster(use pg_dumpall for that).
     ---> Go to your binaries location and execute below command:

     -bash-4.1$ ./pg_dumpall -p 5433 > /tmp/dumpall.sql

      Where -p is port of cluster.

--> Take the dump of a database(use pg_dump for that)
   plain format:
  -bash-4.1$ ./pg_dump -p 5433 -U postgres postgres >> /tmp/dump_postgres_db.sql
   Compressed Format:
   -bash-4.1$ ./pg_dump -p 5433 -U postgres postgres -Fc -f /tmp/dump.dmp

   Where -p - port
         -U - user
         -Fc - Format Compressed(you can use tar by using -Ft)
         -f - dumpfile.

--> Take the dump of a table/sequence

     -bash-4.1$ ./pg_dump -p 5433 -U postgres -t test postgres >> /tmp/dump_test_table.sql

     Where -t - tablename/sequencename

2. File system level backup:
------------------------------
•  An alternative backup strategy is to directly copy the files that PostgreSQL uses to store the data in the database.
•  You can use whatever method you prefer for doing usual file system backups, for example:
   tar -cf backup.tar <location of data directory>
•  The database server must be shut down in order to get a usable backup.
•  File system backups only work for complete backup and restoration of an entire database cluster.
•  File system snapshots work for live servers.

If you take File system level backup, your database has no need to run on archivelog mode but nees to shutdown the server while taking the backup. It is a consistence backup. But you have chances to loose the data.

Scenario:
----------
If you have taken the backup at 8p.m and your database is crashed due to some reason at 9p.m. Then you will loose the data of 8p.m to 9p.m as we have backup at 8p.m only. The 1 hour data will not be recovarable.

3. Continuous Archiving:
--------------------------
•  PostgreSQL maintains WAL files for all transactions in pg_xlog directory
•  PostgreSQL automatically maintains the WAL logs which are full and switched
•  Continuous archiving can be setup to keep a copy of switched WAL Logs which can be later used for recovery.
•  It also enables online file system backup of a database cluster.
•  Requirements:
   wal_level must be set to archive
   archive_mode must be set to on
   archive_command must be set in postgresql.conf

For more details on how to enable archiving, you can follow the below link:-
http://www.enterprisedb.com/docs/en/9.2/pg/continuous-archiving.html

Benefits of running database in Continuous Archiving:
---------------------------------------------------------
a. No need to shutdown the server(Means no downtime required).
b. You will not loose the data.
c. Running in archivelog mode does copy your all WAL files to the location mentioned in configuration file.
d. You can have multiple locations(duplexing of archives) for your archivelog files.(recommended to have these locations on different mount points because if one mount point got corrupted then you can get them from other one).

Scenario:
----------
If you have taken the backup at 8p.m and your database is crashed due to some reason at 9p.m, then you can recover the data till 9PM as we will have copied archive files from 8p.m to 9p.m in archive log location mentioned in the configuration file. If you recover backup of 8p.m , it will automatically apply these archivelogs while database recovery, no manual operation is required.

Getting Started with PostgreSQL for Beginners

I hope these commands will hep beginners to start with PostgreSQL.

We have “pg_ctl” command which we can use to control the postgresql database. We can see this commands behavior as below.
>> How to check the postgresql service status.
$ su – root
# service postgresql­9.2 status
(OR)
#su - postgres
$ /opt/PostgreSQL92/bin/pg_ctl ­D /opt/PostgreSQL92/data status # We need to pass data directory path after ­D.

>> How to start the postgresql service.

 $ su – root
# service postgresql­9.2 start

>> How to start the postgresql as a postgres user(operating system user).

 $ /opt/PostgreSQL92/bin/pg_ctl ­D /opt/PostgreSQL92/data start. 

>> How to stop the postgresql.
$ su – root
# service postgresql­9.2 stop
(OR)
#su - postgres
$ /opt/PostgreSQL92/bin/pg_ctl ­D /opt/PostgreSQL92/data stop ­mf 


>> How to reload the postgresql.

Reload postgresql means, force the postgresql to allow the modifications which we do in postgresql.conf/pg_hba.conf.
 $ su – root
# service postgresql­9.2 reload 

(OR)
$/opt/PostgreSQL92/bin/pg_ctl ­D /opt/PostgreSQL92/data reload 
**Note: Any modifications in pg_hba.conf must require a reload operation of a postgresql database. But, all the parameters in postgresql.conf don’t imply on the database until and unless we restart the database.
>> How to find which parameter change require restart.
This query list out all the parameters which require your postgresql database restart.
 postgres=# select name from pg_settings where context ~ 'postmaster';
This query list out all the parameters which do not require postgresql database restart but only reload.
 postgres=# select name from pg_settings where context != 'postmaster';
>> How to monitor the database connections like which application user executing which query from which ip. This query list out all the connection details of a cluster.
 postgres=# select * from pg_stat_activity ;
This query list out all the active connections list, i.e, the connections which are doing some actions on the database.
 postgres=# select * from pg_stat_activity where waiting is false;
This query list out all the connections list which are waiting.
 postgres=# select * from pg_stat_activity where waiting is true;
>> How to get the object details.  In postgresql, we have some meta commands which helps you to get the list of objects. Ex:-
 postgres=# \dt ­­> Meta command to get the list of tables.
List of relations


Schema | Name | Type | Owner
­­­­­­­­+­­­­­­­­+­­­­­­­+­­­­­­
public | test_p | table | postgres 

\dv --> Meta command to get the list of views.
\di --> List of indexes.


\d  --> Table description 
You can find all these meta commands by executing “\?” in the psql terminal. These commands are not supported in any other applications. For this, we need to use “psql” terminal as below.
 bash­4.1$ ./psql ­p 5432 ­U postgres postgres
psql.bin (9.1.6, server 9.2.0)

WARNING: psql.bin version 9.1, server version 9.2.
Some psql features might not work.
Type "help" for help.
postgres=#

where p - port, U - user
postgres is a database in the cluster.
>> How to get the current locks in database.

 postgres=# select * from pg_locks;
>> How to check the details of replication between the primary and slave.
 postgres=# select * from pg_stat_replication ;
The above query gives us the state of the replication and lag between the primary and slave servers.
>> How to find the blocked queries.
This query list out all the sessions those are currently waiting on which sessions.
SELECT now()::timestamp,waiting.locktype AS waiting_locktype,waiting.relation::regclass::VARCHAR  ASwaiting_table,waiting_stm.Datname as WDatabase,waiting_stm.current_query AS waiting_query,(extract(epochfrom now()) - extract(epoch from  waiting_stm.query_start))::VARCHAR AS Waiting_Stmt_Total_Time,waiting.mode AS waiting_mode,waiting.pid::VARCHAR AS waiting_pid,other.locktype::VARCHAR AS other_locktype,other.relation::regclass::VARCHAR AS other_table,other_stm.Datname::VARCHAR as ODatabase,other_stm.current_query AS other_query,(extract(epoch from now())­-extract(epoch from other_stm.query_start))::VARCHAR AS Other_Stmt_Total_Time,other.mode AS other_mode,other.pid::VARCHAR AS other_pid,other.granted::VARCHAR AS other_granted FROM pg_catalog.pg_locks AS waiting JOIN pg_catalog.pg_stat_activity AS waiting_stm ON (waiting_stm.procpid = waiting.pid) JOIN pg_catalog.pg_locks AS other On ((waiting.database = other.database AND waiting.relation = other.relation) OR waiting.transactionid = other.transactionid) JOIN pg_catalog.pg_stat_activity AS other_stm ON (other_stm.procpid = other.pid) WHERE NOT waiting.granted AND waiting.pid <> other.pid;
>> How to get the database statistics like hitratio, dml statistics.
SELECT sdb.datname,blks_read,blks_hit,round((blks_hit::float / (blks_read+blks_hit+1) * 100)::numeric, 2) as cachehitratio,xact_commit,xact_rollback,tup_returned,tup_fetched,tup_inserted,tup_updated,tup_deleted,ROUND(100.0*AGE(DATFROZENXID)/ (SELECT SETTING::float FROM PG_SETTINGS WHERE NAME ILIKE 'autovacuum_freeze_max_age')) FROM pg_stat_database sdb INNER JOIN pg_database db on sdb.datname=db.datname WHERE sdb.datname !~ '^(template(0|1))' ORDER BY cachehitratio desc;
The above query will give you the database statistics. 

PostgreSQL Installation

PostgreSQL Installation using installer:

Here are steps to install the postgres through installer:

 --> Download the installer from below link:
      
       http://www.enterprisedb.com/products-services-training/pgdownload

 --> Goto the location of PostgreSQL 9.2 installer(postgresql-9.2.5-1-linux-x64.run) and run the installer in text mode as below
 
     

 ./postgresql-9.2.5-1-linux-x64.run --mode text   --> You need to provide the below details while installation.

Installation Directory [/opt/PostgreSQL/9.2]: ===> Provide the base directory for installation (We have provided /opt/PostgreSQL92 in test servers)

Data Directory [/opt/PostgreSQL92/data]: ===> Provide the data directory location(We have provided /opt/PostgreSQL92/data location)

Password : ==> provide the password for "postgres" superuser Retype password :

Port [5433]: ===> provide the port number

Locale
[1] [Default locale] [2] C
[3] en_AG
[4] en_AG.utf8
[5] en_AU.utf8 [6] en_BW.utf8 [7] en_CA.utf8 [8] en_DK.utf8 [9] en_GB.utf8 [10] en_HK.utf8 [11] en_IE.utf8 [12] en_IN
[13] en_IN.utf8 [14] en_NG [15] en_NG.utf8 [16] en_NZ.utf8 [17] en_PH.utf8 [18] en_SG.utf8
[19] en_US.utf8
[20] en_ZA.utf8
[21] en_ZW.utf8
[22] POSIX
Please choose an option [1] : 1 ==> select the locale (By default it is [1] Default locale)
Do you want to continue? [Y/n]: y
---------------------------------------------------------------------------- Please wait while Setup installs PostgreSQL on your computer.
Installing
0% ______________ 50% ______________ 100% 
#################################




--> After the installation, you need to set environment variables. There is a pg_env.sh file in which all environment variables are exported in /opt/PostgreSQL92(which base installation directory). You can run that file to set env settings Or you can set it in .bash_profile where every postgres user login env file gets run.


--> Open the bash file $vi .bash_profile -- Enter the below line:

source /opt/PostgreSQL92/pg_env.sh -- Run the bash file for the first time
$. .bash_profile

--> Now you can connect to database by just typing the "psql"


-bash-4.1$ psql 
Timing is on.
psql.bin (9.2.5)
Type "help" for help.

postgres=# 

Monday, 28 October 2013

PostgreSQL Architecture

I know its not a first blog on PostgreSQL Architecture, but its my first one.. :-) I have created a diagram of PostgreSQL architecture based on my experience with the database. 

I have discussed this chart with PostgreSQL experts on how Postgres works and here is some description from my discussion:


Architecture Diagram:

=============





Working Process:
===========

When you start PostgreSQL, The Postmaster starts first and allocates the shared memory. It also accepts connections and spins off a backend for each new connection. So each backend (server process) gets its pointers to shared memory from the postmaster. It is pretty disastrous if the postmaster dies with backends still running, so we have it do as little as possible, so that there isn't as much which can crash it. Postgres does have a pool of shared memory; however, it does not have a library or dictionary cache stored in that memory. This means that statements do need to be parsed and planned every time they are entered. If parse/plan overhead is an issue, we suggest the use of prepared statements. While Oracle is able to avoid the repeated parse/plan overhead, it must still do enough analysis of the query to determine whether the information is present in the library cache, which also consumes some time and CPU resources. The parser is quite lightweight, so we feel that the overhead of parsing the query each time is acceptable.


1. Shared Memory:

----------------------

SHARED BUFFERS -- The biggest chunk of shared memory is shared_buffers. When pages from a table or index are read from the OS, they are read into shared_buffers, and the backends reference the pages and their contents right there in shared memory. An exception is temporary tables, where (since only the creating backend can reference the temp table) data is accessed in temp_buffer space as much as possible(temp_buffers is separate. It is not in shared memory). It's faster to access process-local memory like that because you don't need to worry about pinning or locking the data, since you are not sharing it.

WAL BUFFERS -- are for buffering data to be written to the WAL files.
CLOG BUFFERS -- are one of the SLRU-style buffers oriented toward circular "rings" of data, like which transaction numbers have been committed or rolled back.
LOCK SPACE -- Memory structures in shared memory are generally protected by "lightweight" locks, which are in shared memory. Tables are protected by "heavyweight" locks which are also in shared memory (and themselves protected by lightweight locks). Of course, lightweight locks are protected by spinlocks. It gets very complicated and fussy. :-)
OTHER BUFFERS -- are probably mostly SLRU buffers besides CLOG (which was the first user of the SLRU system). SLRU is good for data where you mostly want to use recently accessed data and then you are done with it relatively quickly.
The opposite of shared memory is process-local memory -- only the one process that allocates it can access it. Each SLRU system has a separate subdirectory. Shared memory is memory that all of the backend server processes can directly access. To prevent chaos, access to shared memory must follow some rules which tends to make it a little slower, like locking areas of memory a process will be using. Process-local memory is allocated by one backend server process, and the other backend server processes can't see it or use it, so it's faster to access, and no worries about another process trashing it while you're using it.
Getting back to the shared memory uses I hadn't talked about, CLOG buffers and SLRU buffers like multixact, notify, subtrans, serial, etc. use buffers in memory for recently accessed data, but spill to disk in those subdirectories beyond a limited number of buffers.

2. Utility Processes:
---------------------
With a default configuration I see the postmaster, the checkpointer process, the writer process, the wal writer process, the autovacuum launcher process, and the stats collector process. I think you will see more processes running if you turn on archiving or streaming replication. You might also get a process for writing the server log, depending on configuration. As their name say, WRITER -- process is responsible to write the dirty buffers to data files, CHECKPOINTER -- process is for checkpoint, WAL WRITER -- is for writing the dirty buffers in WAL buffers to WAL files, AUTOVACUUM LAUNCHER -- process lauches autovacuum when require(depends on your autovacuum settings in postgresql.conf file) and STATS COLLECTOR -- process to collect the statistics of objects in the database require by Optimizer to improve the performance.

The checkpointer process is responsible for creating safe points from which a recovery can begin; the background writer tries to keep some pages available for re-use so that processes running queries don't need to wait for page writes  in order to have free spots to use in shared buffers. Both checkpointer and writer processes writes to the same files, however the checkpointer writes all data that was dirty as of a certain time (the start of the checkpoint) regardless of how often it was used since dirtied, and the background writer writes data that hasn't been used recently, regardless of when it was first dirtied.  Neither knows or cares whether the data being written was committed, rolled back, or still in progress.

3. Directory Structure:
--------------------------
All the data needed for a database cluster is stored within the cluster's data directory, commonly referred to as PGDATA. You can get the detailed description at below link:
http://www.enterprisedb.com/docs/en/9.2/pg/storage-file-layout.html

I see the diagram left out the one I would like to add: pg_serial. pg_serial is used to track summarized information about committed serializable transactions which might still become part of a serialization failure rolling back some not-yet-committed transaction to protect data integrity.
The catalog cache is information from the system tables which describes the tables, indexes, views, etc. in the database. If you had to re-read that from the system tables each time, it would be slow. Even shared memory would be clumsy for that, so each backend process has its own cache of system catalog data for fast lookup. When anything changes, all backends are sent a signal to update or reload their cache data. When pages are read or written, they go through the OS cache, which is not directly under PostgreSQL control. The optimizer needs to keep track of a lot of information while it parses and plans a query, which is why that is shown. A plan has execution nodes, some of which may need to use memory; that is where work_mem comes in -- a sort or hash table (as examples) will try not to exceed work_mem *for that node*. It is significant that one query might use quite a few nodes which each allocate memory up to work_mem. But since most queries are simpler and might not use any work_mem allocations, people often do their calculations based on an expected maximum of one allocation per backend (i.e., per connection). But that could be off by quite a bit if all connections might be running queries with five nodes allocating memory.

It is worth noting that if there is enough RAM on the machine to have a good-sized OS cache, a PostgreSQL page read will often just be a copy from system cache to pg shared_buffers, and a page write will often just be a copy from pg shared_buffers to the system cache. The fsync of tables which is part of the checkpoint process is when they are actually written from the OS to the storage system. But even there a server may have a battery-backed RAM cache, so the OS write to storage is often just a copy in RAM.... unless there is so much writing that the RAID controller's cache fills, at which point writes suddenly become hundreds of times slower than they were.


Other interesting dynamics: pg will try to minimize disk writes by hanging onto dirty buffers (ones which have logically been updated) before writing them to the OS. But buffers may need to be written so they can be freed so that a new read or write has a buffer to use. If a request to read a page or write to a new buffer can't find an idle page, the query might need to write a buffer dirtied by some other backend before it can do its read (or whatever). The background writer can help with this. It tries to watch how fast new pages are being requested and write out dirty pages at a rate which will stay ahead of demand.