Saturday 29 March 2014

Is it failback or switchover with PostgreSQL? Ah, maybe not...

Sometimes I feel like I found something new in postgres, but most of the times it turned out to be known one, I mean, already known by people. I believe, this is a kinda same thing now(this might be known by many already). I see many asking about switchover/failback in PostgreSQL and I always suggest to go for a fresh backup of new master(after failover) to sync old master as a slave to it. Of course, it is the correct procedure. However there is something to achieve this(like below procedure): 
 1. Setup steaming replication between two clusters i.e "Master_but_Slave_soon" with 8888 port and "Slave_gonna_Master" with 8889. See here to setup SR.
-bash-4.1$ /opt/PostgreSQL/9.3/bin/pg_ctl -D /tmp/rep/Slave_gonna_Master/ start
server starting
-bash-4.1$ LOG:  database system was interrupted; last known up at 2014-03-30 09:26:59 IST
LOG:  entering standby mode
LOG:  restored log file "000000010000000000000003" from archive
LOG:  redo starts at 0/3000028
LOG:  consistent recovery state reached at 0/30000F0
LOG:  database system is ready to accept read only connections
cp: cannot stat `/tmp/rep/arch/000000010000000000000004': No such file or directory
LOG:  started streaming WAL from primary at 0/4000000 on timeline 1
-bash-4.1$
-bash-4.1$ ./pg_ctl -D /tmp/rep/Master_but_Slave_soon/ status
pg_ctl: server is running (PID: 26531)
/opt/PostgreSQL/9.3/bin/postgres "-D" "/tmp/rep/Master_but_Slave_soon"
-bash-4.1$
-bash-4.1$
-bash-4.1$ ./psql -p 8888 postgres<
Timing is on.
psql.bin (9.3.2)
Type "help" for help.

postgres=# \x 
Expanded display is on. 
postgres=# select * from pg_stat_replication; 
-[ RECORD 1 ]----+--------------------------------- 
pid              | 26651 
usesysid         | 10 
usename          | postgres 
application_name | walreceiver 
client_addr      | 127.0.0.1 
client_hostname  | 
client_port      | 26058 
backend_start    | 2014-03-30 09:28:59.204298+05:30 
state            | streaming 
sent_location    | 0/4000000 
write_location   | 0/4000000 
flush_location   | 0/4000000 
replay_location  | 0/4000000 
sync_priority    | 0 
sync_state       | async 
 
Time: 155.089 ms 
postgres=# 
postgres=# 
postgres=# \q 
-bash-4.1$ 
-bash-4.1$ 
-bash-4.1$ ./pg_ctl -D /tmp/rep/Slave_gonna_Master/ status 
pg_ctl: server is running (PID: 26643) 
/opt/PostgreSQL/9.3/bin/postgres "-D" "/tmp/rep/Slave_gonna_Master" 
-bash-4.1$ 
-bash-4.1$ 
-bash-4.1$ ./psql -p 8889 postgres 
Timing is on. 
psql.bin (9.3.2) 
Type "help" for help. 
 
postgres=# 
postgres=# select pg_is_in_recovery(); 
 pg_is_in_recovery 
------------------- 
 t 
(1 row) 
 
Time: 3.031 ms 
postgres=#
2. Go for Failover now by creating trigger file. Now our Slave is new Master.
-bash-4.1$ cat /tmp/rep/Slave_gonna_Master/recovery.conf 
standby_mode=on 
primary_conninfo='host=localhost port=8888' 
restore_command='cp /tmp/rep/arch/%f %p' 
trigger_file='/tmp/rep/8888.trigger' 
 
-bash-4.1$ touch /tmp/rep/8888.trigger 
LOG:  trigger file found: /tmp/rep/8888.trigger 
LOG:  redo done at 0/5000028 
cp: cannot stat `/tmp/rep/arch/000000010000000000000005': No such file or directory 
cp: cannot stat `/tmp/rep/arch/00000002.history': No such file or directory 
LOG:  selected new timeline ID: 2 
cp: cannot stat `/tmp/rep/arch/00000001.history': No such file or directory 
LOG:  archive recovery complete 
LOG:  autovacuum launcher started 
LOG:  database system is ready to accept connections 
 
-bash-4.1$ 
-bash-4.1$ /opt/PostgreSQL/9.3/bin/psql -p 8889 -U postgres postgres 
Timing is on. 
psql.bin (9.3.2) 
Type "help" for help. 
 
postgres=# 
postgres=# select pg_is_in_recovery(); 
 pg_is_in_recovery 
------------------- 
 f 
(1 row) 
 
3. Once failover is done, timeline ID of WALs will be changed.
bash-4.1$ ls -ltrh /tmp/rep/Slave_gonna_Master/pg_xlog/ 
total 97M 
-rw-------. 1 postgres postgres  16M Mar 30 09:27 000000010000000000000002 
-rw-------. 1 postgres postgres  16M Mar 30 09:27 000000010000000000000001 
-rw-------. 1 postgres postgres  16M Mar 30 09:28 000000010000000000000003 
-rw-------. 1 postgres postgres  16M Mar 30 09:35 000000010000000000000004 
-rw-------. 1 postgres postgres  16M Mar 30 09:35 000000010000000000000005 
-rw-------. 1 postgres postgres   41 Mar 30 09:36 00000002.history 
drwx------. 2 postgres postgres 4.0K Mar 30 09:36 archive_status 
-rw-------. 1 postgres postgres  16M Mar 30 09:41 000000020000000000000005 
-bash-4.1$ 
-bash-4.1$ 
-bash-4.1$ 
-bash-4.1$ cat /tmp/rep/Slave_gonna_Master/pg_xlog/*.history 
1  0/5000090    no recovery target specified
4. Now shutdown the Master and create a recovery.conf file(change the port number to new Master for primary_conninfo parameter) and change hot_standby parameter to on in postgresql.conf and add pg_hba.conf entries in new Master to allow connections.
-bash-4.1$ /opt/PostgreSQL/9.3/bin/pg_ctl -D /tmp/rep/Master_but_Slave_soon/ stop -mf 
waiting for server to shut down....LOG:  received fast shutdown request 
LOG:  aborting any active transactions 
LOG:  autovacuum launcher shutting down 
LOG:  shutting down 
LOG:  database system is shut down 
-bash-4.1$ 
-bash-4.1$ 
-bash-4.1$ 
-bash-4.1$ cp -R /tmp/rep/Slave_gonna_Master/recovery.done /tmp/rep/Master_but_Slave_soon/recovery.conf 
-bash-4.1$
5. Now move pg_control of Old Master to pg_control.old and Copy the pg_control of New Master to Old Master(I'm not sure if copying pg_control between replication clusters while they are in sync is recommended).
-bash-4.1$ mv /tmp/rep/Master_but_Slave_soon/global/pg_control /tmp/rep/Master_but_Slave_soon/global/pg_control.old 
-bash-4.1$ 
-bash-4.1$ cp -R /tmp/rep/Slave_gonna_Master/global/pg_control /tmp/rep/Master_but_Slave_soon/global/ 
6. Now try to start the Old Master:
-bash-4.1$ ./pg_ctl -D /tmp/rep/Master_but_Slave_soon/ start 
server starting 
-bash-4.1$ LOG:  database system was interrupted; last known up at 2014-03-30 09:36:11 IST 
LOG:  entering standby mode 
LOG:  restored log file "00000002.history" from archive 
LOG:  record with zero length at 0/50000F8 
LOG:  invalid primary checkpoint record 
LOG:  using previous checkpoint record at 0/4000060 
LOG:  database system was not properly shut down; automatic recovery in progress 
LOG:  redo starts at 0/4000028 
LOG:  record with zero length at 0/5000090 
LOG:  consistent recovery state reached at 0/5000090 
LOG:  database system is ready to accept read only connections 
cp: cannot stat `/tmp/rep/arch/000000020000000000000005': No such file or directory 
LOG:  restored log file "000000010000000000000005" from archive 
LOG:  record with zero length at 0/5000090 
LOG:  started streaming WAL from primary at 0/5000000 on timeline 2 
7. Check the replication status between New Master(Old Slave) and Old Master(New Slave).
-bash-4.1$ /opt/PostgreSQL/9.3/bin/psql -p 8889 -U postgres postgres 
Timing is on. 
psql.bin (9.3.2) 
Type "help" for help. 
 
postgres=# 
postgres=# select pg_is_in_recovery(); 
 pg_is_in_recovery 
------------------- 
 f 
(1 row) 
 
Time: 1.469 ms 
postgres=# \x 
Expanded display is on. 
postgres=# select * from pg_stat_replication ; 
-[ RECORD 1 ]----+--------------------------------- 
pid              | 26948 
usesysid         | 10 
usename          | postgres 
application_name | walreceiver 
client_addr      | 127.0.0.1 
client_hostname  | 
client_port      | 52443 
backend_start    | 2014-03-30 09:38:09.862199+05:30 
state            | streaming 
sent_location    | 0/5000340 
write_location   | 0/5000340 
flush_location   | 0/5000340 
replay_location  | 0/5000340 
sync_priority    | 0 
sync_state       | async 
 
Time: 52.213 ms 
postgres=# 
postgres=# 
postgres=# 
postgres=# create table after_failover(t int); 
CREATE TABLE 
Time: 67.959 ms 
postgres=# 
postgres=# 
postgres=# \q 
-bash-4.1$ /opt/PostgreSQL/9.3/bin/psql -p 8888 -U postgres postgres 
Timing is on. 
psql.bin (9.3.2) 
Type "help" for help. 
 
postgres=# 
postgres=# select pg_is_in_recovery(); 
 pg_is_in_recovery 
------------------- 
 t 
(1 row) 
 
Time: 1.411 ms 
postgres=# \dt after_failover 
             List of relations 
 Schema |      Name      | Type  |  Owner   
--------+----------------+-------+---------- 
 public | after_failover | table | postgres 
(1 row) 
Though it works here, I don't have any other test results as it might not work in other situations.

Saturday 8 March 2014

Want to ensure that Postgres generates core dump after server crash?


As you all know, we have to start the server with "-c" option in order to generate core dump during server crash. However if you want to know whether your server is already started to generate core dump, then here you go:
Start  the PostgreSQL cluster with core dump option
.pg_ctl -c -D <Data directory location>

Here are the steps to find whether the server is started with -c option.

1. Check the procpid of the postmaster. Go to the location /proc/<procpid of postmaster>.
2. At this location, you can verify limits file to check whether server has started with "-c" option. If you do not have any limitations for enterprisedb user in "limits.conf" file at "/etc/security" location, then the "Soft Limit" of "Max core file size" in "limits" file will be "unlimited". The entry in limits file should be like below:

Limit Soft Limit Hard Limit Units
Max core file size unlimited unlimited bytes

If "Soft Limit" of "Max core file size" in "limits" file is "0", it indicates that your postmaster was not started with "-c" option.

Limit Soft Limit Hard Limit Units
Max core file size 0 unlimited bytes


If you have any limitations for all users or enterprisedb user in "limits.conf" file under "/etc/sercurity", then the above value for Soft Limit or Hard Limit will depends on value in "limits.conf" file.

In simple way, if your postmaster has started with "-c" option, then the "Soft Limit" of "Max core file size" in "limits" file should be some value or unlimited otherthan "0"(which depends on limits.conf file).

If we have started our postmaster with "-c" option, a core dump file will be generated under $PG_DATA directory during a backend crash. A core dump will be generated by the operating system, and we will be able to attach gdb to it to collect a stack trace or other information.

Here is one simple test case which shows us how core dump will be generated:

1. We have started the postmaster with "-c" option.
-bash-4.1$ /opt/PostgreSQL/9.3/bin/pg_ctl -D /opt/PostgreSQL/9.3/data -c start
server starting

2. Go to the location /proc/<procpid of postmaster> and check the limits file to know whether postmaster was started with "-c" option.
-bash-4.1$ ps -ef|grep PostgreSQL|grep 9.3
postgres 48892     1  0 14:28 pts/15   00:00:00 /opt/PostgreSQL/9.3/bin/postgres -D /opt/PostgreSQL/9.3/data
postgres 48980 46943  0 14:30 pts/15   00:00:00 grep PostgreSQL
-bash-4.1$
-bash-4.1$ cd /proc/48892/
-bash-4.1$ ls -ltrh


enterprisedb@ubuntu:/proc/6711$ ls -ltr limits
-r-------- 1 enterprisedb enterprisedb 0 2012-08-28 05:15 limits
-bash-4.1$ grep 'core' limits
Max core file size        unlimited            unlimited            bytes   
-bash-4.1$ cat limits
Limit                     Soft Limit           Hard Limit           Units   
Max cpu time              unlimited            unlimited            seconds 
Max file size             unlimited            unlimited            bytes   
Max data size             unlimited            unlimited            bytes  
Max stack size            10485760             unlimited            bytes   
Max core file size        unlimited            unlimited            bytes   
Max resident set          unlimited            unlimited            bytes   
Max processes             1024                 15777                processes
Max open files            1024                 1024                 files   
Max locked memory         65536                65536                bytes   
Max address space         unlimited            unlimited            bytes   
Max file locks            unlimited            unlimited            locks   
Max pending signals       15777                15777                signals 
Max msgqueue size         819200               819200               bytes   
Max nice priority         0                    0                   
Max realtime priority     0                    0                   
Max realtime timeout      unlimited            unlimited            us      

3. For example, aborting any backend process will generate a core dump file under $PG_DATA directory.
-bash-4.1$ ps -ef|grep 48892
postgres 48892     1  0 14:28 pts/15   00:00:00 /opt/PostgreSQL/9.3/bin/postgres -D /opt/PostgreSQL/9.3/data
postgres 48893 48892  0 14:28 ?        00:00:00 postgres: logger process         postgres 49090 48892  0 14:32 ?        00:00:00 postgres: checkpointer process   postgres 49091 48892  0 14:32 ?        00:00:00 postgres: writer process         postgres 49092 48892  0 14:32 ?        00:00:00 postgres: wal writer process     
postgres 49093 48892  0 14:32 ?        00:00:00 postgres: autovacuum launcher process           
postgres 49094 48892  0 14:32 ?        00:00:00 postgres: stats collector process
postgres 49114 46943  0 14:32 pts/15   00:00:00 grep 48892
-bash-4.1$
-bash-4.1$ kill -ABRT 49091 
-bash-4.1$ ls -ltrh /opt/PostgreSQL/9.3/data/core*
-rw-------. 1 postgres postgres 143M Mar  8 14:33 /opt/PostgreSQL/9.3/data/core.49091
4. We can attach gdb to core dump to collect a stack trace by using below command.
-bash-4.1$ gdb -q -c /opt/PostgreSQL/9.3/data/core.49091 /opt/PostgreSQL/9.3/bin/postgres

we should have debug symbols in our installer which was used for installation to get proper trace?, maybe, I'm poor at that, so better stay here.. ;-)

Saturday 1 March 2014

Huge archive generation with PostgreSQL cluster??

Recently I found huge archive logs generation on one of the servers(for my client). Around 3500 files per day and it needs 3500 * 16 MB = 54GB space every day and space is low on the server. First thing came to my mind was checkpoint parameters, I had a look at them and they were at default. Hmm, so had changed them as below:

checkpoint_segments = 180
checkpoint_timeout = 30min
checkpoint_completion_target = 0.9 

Monitored for couple of days and found that archive logs generation decreased to 50%(around 1700 files). 

With the above parameter settings, archive log generation should be minimal. However if not, Here are some general suggestions to look at:

-- Make sure that you don't UPDATE a row to the same values it already has. The row will be logged even if old and new values match. One way to handle this is to make sure that any UPDATE statements which might set a column to its existing value include in the WHERE clause a test that oldvalue IS DISTINCT FROM new value. Or you can consider creating trigger on problem tables using the Postgres built-in function suppress_redundant_updates_trigger (you might want to look here: http://www.postgresql.org/docs/9.3/static/functions-trigger.html) 

-- Use one UPDATE statement instead of several where possible. If updates to different columns or from multiple processing steps can be combined, this can reduce WAL logging.

-- Consider using UNLOGGED or TEMPORARY tables where possible. For example, if data is inserted into the database and updated right away, perform the initial steps in an UNLOGGED or TEMPORARY table, and INSERT into the main table once the data has been prepared.

-- Identify and eliminate unused indexes. Index changes are WAL-logged so that indexes can be recovered on a crash. The storage space needed for indexes can be more than the storage space needed for the data itself, with a corresponding increase in generated WAL.