Monday, 3 October 2016

A new book from Packt Publishing ! Guess what !

One of my dreams..!! .. not expected..!!.. too soon..!!
I am glad to announce that my first book(PostgreSQL Development Essentials) as an author has been published.
The purpose of this book is to teach database developers fundamental practices and techniques for programming database applications with PostgreSQL. It is targeted at Database developers using PostgreSQL who have basic experience developing database applications with the system, but want a deeper understanding of how to implement programmatic functions with PostgreSQL.
Thanks for Packt Publishing for giving me this opportunity. Special thanks to Izzat Contractor for choosing me and Dinesh Kumar for helping in writing. Thanks to Anish Sukumaran, Nitin Dasan, and Sunith Shetty for working with me.

Saturday, 18 April 2015

Aha, you can count the rows for \copy command.

We all know that \copy command does not return anything when you load the data. The idea is to capture how many # of records got loaded into table through \copy command.
Here's a shell script that should work:
echo number of rows in input: $(wc -l data.in)
( echo "\copy test from stdin delimiter '|';" ; cat data.in  ) | psql -v ON_ERROR_STOP=1
echo psql exit code $?

If the exit code printed is 0, everything went well, and the value printed by the first echo can be used to to indicate how many rows were inserted. If the printed exit code is non-zero, no rows were inserted, of course. If the exit code printed is 3 then the data being copied had some error.

From the docs: If the exit code printed is 1 or 2 then something went wrong in psql (like it ran out of memory) or the server connection was broken, respectively. Following facts play a role in the above script:

.) COPY (and hence \copy) expects the input records to be terminated by a newline. So counting the number of newlines in the input is a reliable way of counting the records inserted.
.) psql will exit with code 3 iff there's an error in script and ON_ERROR_STOP is set. 
Note: This seems to not apply to the `psql -c "sql command"` construct.

# Example clean input

$ pgsql -c "create table test(a text,b int);"
CREATE TABLE
$ cat data.in 
column1|2
column1|2
column1|2
column1|2
column1|2
column1|2

$ echo number of rows in input: $(wc -l data.in); ( echo "\copy test from stdin delimiter '|';" ; cat data.in  ) | psql -v ON_ERROR_STOP=1 ; echo psql exit code $? 
number of rows in input: 6 data.in
psql exit code 0

# Example malformed input
$ cat data.in 
column1|2
column1|2
column1|2c
column1|2
column1|2
column1|2

$ echo number of rows in input: $(wc -l data.in); ( echo "\copy test from stdin delimiter '|';" ; cat data.in  ) | pgsql -v ON_ERROR_STOP=1 ; echo psql exit code $? 
number of rows in input: 6 data.in
ERROR:  invalid input syntax for integer: "2c"
CONTEXT:  COPY test, line 3, column b: "2c"
psql exit code 3
 
I hope this helps someone.

Woohoo !! Packt Publishing has published a book on troubleshooting PostgreSQL database.

(Baji is trying to impress 'X')
==========
Baji: Packt Publishing has published a book on troubleshooting PostgreSQL database.
 _X_: Uh, so what(!?). It published other 4 PostgreSQL books this year !
Baji: yeah, I know !
 _X_: then why do you care about thisssss.
Baji: I should care about it as I was part of technical reviewing team.. :(
 _X_: Oh really !, thats fantastic.. Congratulations !
==========

Note: Finally, Baji impressed _X_ :-)

Ok, in reality, I am glad to announce that "My first book as a Technical Reviewer has been published by Packt Publishing" ;-)

https://www.packtpub.com/big-data-and-business-intelligence/troubleshooting-postgresql
http://my.safaribooksonline.com/book/databases/postgresql/9781783555314/troubleshooting-postgresql/pr02_html

Author of this book is Hans-Jürgen Schönig, he has couple of other PostgreSQL Books as well.

This book is to provide a series of valuable troubleshooting solutions to database administrators responsible for maintaining a PostgreSQL database. It is aimed at PostgreSQL administrators who have developed an application with PostgreSQL, and need solutions to common administration problems they encounter when managing a database instance. So give a try ;-)

I would like to thank my loving parents for everything they did for me. Personal time always belongs to family, and I did this in my personal time.

I want to thank the Packt Publishing for giving me this opportunity and thanks to Sanchita Mandal and Paushali Desai for choosing me and working with me for this project.

Last but not least, would like to thanks Dinesh Kumar who taught me PostgreSQL and inspiring me for this. :)

Saturday, 7 February 2015

Someone asked me.. "how to change the location of core file generated by postgres".

I remember someone asking me about changing the location of core file generated by postgres. We all know it creates under PGDATA by default, however some people want to avoid that as core file size will be huge some times and eats all space of data directory which will turn into shutdown of cluster. So I thought it will be good if we have an article which shows changing location.

On Linux servers, core file generation can be enabled by running "ulimit -c unlimited" before starting the server, or by using the -c option to pg_ctl start. On Windows, if you're running PostgreSQL 9.1, you can create a "crashdumps" subdirectory inside the data directory.  On earlier versions, it's harder.

Before enabling/disabling, if you want to verify if your cluster started to generate core files or not, then check this. Ok, I have enabled core file gneration for my cluster, let change the location. Here are detailed steps:

-- Start the cluster using "-c" option(cluster user must be set to generate core files).
-- Check the core file pattern as root user using below command:
[root@localhost ~]# sysctl kernel.core_pattern
kernel.core_pattern = |/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e

-- Change the kernel.core_pattern to the location in which you want to generate core files(Contact your Admin to do that). Please note that location given in kernel.core_pattern must be writable by the cluster user, or else the kernel will decline to write a core file there.
[root@localhost ~]# echo "kernel.core_pattern=/tmp/core-%e-%s-%u-%g-%p-%t" >> /etc/sysctl.conf
[root@localhost ~]# tail -5 /etc/sysctl.conf
# max OS transmit buffer size in bytes
net.core.wmem_max = 1048576
fs.file-max = 6815744
########
kernel.core_pattern=/tmp/core-%e-%s-%u-%g-%p-%t
[root@localhost ~]#
[root@localhost ~]#
[root@localhost ~]# sysctl -p |tail -5
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048576
fs.file-max = 6815744
kernel.core_pattern = /tmp/core-%e-%s-%u-%g-%p-%t
[root@localhost ~]#
%% - A single % character
%p - PID of dumped process
%u - real UID of dumped process
%g - real GID of dumped process
%s - number of signal causing dump
%t - time of dump (seconds since 0:00h, 1 Jan 1970)
%h - hostname (same as ’nodename’ returned by uname(2))
%e - executable filename

-- Verify if cluster is started to generate core file.
bash-4.1$ ps -ef|grep data|grep "9.3"
504       3405     1  0 20:44 ?        00:00:00 /opt/PostgresPlus/9.3AS/bin/edb-postgres -D /opt/PostgresPlus/9.3AS/data
postgres  6155     1  0 21:37 pts/0    00:00:00 /opt/PostgreSQL/9.3/bin/postgres -D /opt/PostgreSQL/9.3/data
bash-4.1$ grep -i core /proc/6155/limits
Max core file size        unlimited            unlimited            bytes  
bash-4.1$

-- Check if cluster crash creates core files in the given location. Let me kill a process to do generate core.
bash-4.1$ ps -ef|grep 6155
postgres  6155     1  0 21:37 pts/0    00:00:00 /opt/PostgreSQL/9.3/bin/postgres -D /opt/PostgreSQL/9.3/data
postgres  6156  6155  0 21:37 ?        00:00:00 postgres: logger process                                  
postgres  6158  6155  0 21:37 ?        00:00:00 postgres: checkpointer process                            
postgres  6159  6155  0 21:37 ?        00:00:00 postgres: writer process                                  
postgres  6160  6155  0 21:37 ?        00:00:00 postgres: wal writer process                              
postgres  6161  6155  0 21:37 ?        00:00:00 postgres: autovacuum launcher process                    
postgres  6162  6155  0 21:37 ?        00:00:00 postgres: stats collector process                        
postgres  6527  6001  0 21:38 pts/0    00:00:00 grep 6155

bash-4.1$ kill -ABRT 6159  -- Killing writer process to get core dump.
bash-4.1$
bash-4.1$
bash-4.1$ ls -ltrh /tmp/core*postgre*
-rw-------. 1 postgres postgres 143M Feb  7 21:41 /tmp/core-postgres-6-501-501-6159-1423325468
bash-4.1$
bash-4.1$ date
Sat Feb  7 21:41:25 IST 2015

-- Check the log entries
-bash-4.1$ grep "6159"  postgresql-2015-02-07_213749.log
2015-02-07 21:41:09 IST LOG:  background writer process (PID 6159) was terminated by signal 6: Aborted

Wow, it generated in new location. Any comments/suggestions are most welcome.

Sunday, 16 November 2014

Ah, Does it mean a bad hardware or a kernel...Uh, Just want to avoid it.

I have seen many customers coming up with below errors and asking for root cause. They wonder with the reasons behind it and say "Ah, its because of a bad hardware or a kernel.. I hate it, just want to know how to avoid these"

Lets start with this:
ERROR: could not read block 4285 in file "base/xxxxx/xxxx": read only 0 of 8192 bytes

... have rarely been known to be caused by bugs in specific Linux kernel versions.  Such errors are more often caused by bad hardware, anti-virus software, improper backup/restore procedures, etc.

One very common cause for such corruption lately seems to be incorrect backup and restore. (For example, failure to exclude or delete all files from the pg_xlog directory can cause problems like this, or using filesystem "snapshots" which aren't really atomic.) The history of the database, including any recoveries from backup or promotion of replicas to primary, could indicate whether this is a possible cause. Faulty hardware is another fairly common cause, including SANs. If fsync or full_page_writes were ever turned off for the cluster, that could also explain it.

It is good to establish the cause where possible, so that future corruption can be avoided, but to recover the cluster should normally be dumped with pg_dumpall and/or pg_dump, and restored to a freshly created (via initdb) cluster on a machine which is not suspected of causing corruption. It may be possible to fix up or drop and recreate individual damaged objects, but when doing that it can be hard to be sure that the last of the corruption (or the cause of the initial corruption) has been eliminated.

Here is a nice article to find why-is-my-database-corrupted from Robert Haas.

Errors like this:
ERROR: unexpected data beyond EOF in block xxxx of relation pg_tblspc/xxxx
HINT: This has been seen to occur with buggy kernels; consider updating your system.

... are most often caused by Linux kernel bugs. If you are seeing both types of errors suggests it is likely that a hardware problem (like bad RAM) may be the cause of both problems, although other causes cannot be ruled out. It is recommended that you schedule a maintenance window and run thorough hardware checks. The latter error mes age has never been known to be caused by a bug in PostgreSQL itself or by improper backup/restore; it can only be caused by an OS bug or something which is interfering with the OS-level actions -- like hardware problems or AV software. The kernel bug can affect anything adding pages to a table or its indexes. It is a race condition in the kernel, so it will probably be infrequent and it will be hard to reproduce or to predict when it will be encountered. It can be caused by an fallocate() bug which is indeed fixed in below release:

6.5, also termed Update 5, 21 November 2013 (kernel 2.6.32-431): https://rhn.redhat.com/errata/RHSA-2013-1645.html

Given all the distributions of Linux and the different timings with which each has incorporated different bug fixes, it is not feasible to give a list of Linux versions that are known to work well.  A more practical approach would be to find out the exact version of Linux being used, and then do a web search for known bugs in that version.  Most often the main source of that is the list of bugs fixed in later versions.  The bug which could cause this error was fixed several years ago in all major distributions, so any bug-fix version of Linux released in the last two years is unlikely to contain the relevant bug, so simply applying available bug fixes for the distribution should rule out OS problems unless this is a new OS bug which is not yet run into.  If you continue to see this error while running with the latest OS bug fixes, the most likely cause is bad hardware.

I have googled around on "suggestions to avoid corruptions" and found this article from Craig Ringer. Here are some suggestions made by community/core team members:

** Maintain rolling backups with proper ageing. For example, keep one a day for the last 7 days, then one a week for the last 4 weeks, then one a month for the rest of the year, then one a year.
** Use warm standby with log shipping and/or replication to maintain a live copy of the DB.
** If you want point-in-time recovery, keep a few days or weeks worth of WAL archives and a basebackup around. That'll help you recover from those "oops I meant DROP TABLE unimportant; not DROP TABLE vital_financial_records;" issues.
** Keep up to date with the latest PostgreSQL patch releases. Don't be one of those people still running 9.0.0 when 9.0.10 is out.
** Plug-pull test your system when you're testing it before going live. Put it under load with something like pgbench, then literally pull the plug out. If your database doesn't come back up fine you have hardware, OS or configuration problems.
** Don't `kill -9` the postmaster. It should be fine, but it's still not smart.
** ABSOLUTELY NEVER DELETE postmaster.pid
** Use good quality hardware with proper cooling and a good quality power supply. If possible, ECC RAM is a nice extra.
** Never, ever, ever use cheap SSDs. Use good quality hard drives or (after proper testing) high end SSDs. Read the SSD reviews periodically posted on this mailing list if considering using SSDs. Make sure the SSD has a supercapacitor or other reliable option for flushing its write cache on power loss. Always do repeated plug-pull testing when using SSDs.
** Use a solid, reliable file system. zfs-on-linux, btrfs, etc are not the right choices for a database you care about. Never, ever, ever use FAT32.
** If on Windows, do not run an anti-virus program on your database server. Nobody should be using it for other things or running programs on it anyway.
** Avoid RAID 5, mostly because the performance is terrible, but also because I've seen corruption issues with rebuilds from parity on failing disks.
** Use a good quality hardware RAID controller with a battery backup cache unit if you're using spinning disks in RAID. This is as much for performance as reliability; a BBU will make an immense difference to database performance.
** If you're going to have a UPS (you shouldn't need one as your system should be crash-safe), don't waste your money on a cheap one. Get a good online double-conversion unit that does proper power filtering. Cheap UPSs are just a battery with a fast switch, they provide no power filtering and what little surge protection they offer is done with a component that wears out after absorbing a few surges, becoming totally ineffective. Since your system should be crash-safe a cheap UPS will do nothing for corruption protection, it'll only help with uptime.

Saturday, 15 November 2014

Finally, its.... Slony !! (switchover and failover)


Slony is a great PostgreSQL replication tool and it’s used by many organizations. The tool works with different versions of Postgres, and it’s easy to upgrade with minimum downtime – sometimes with zero downtime!

Recently, I was working with a customer on a switchover and failover of Slony. I have seen a lot of bloggers post about installing and configuring Slony so I thought I would share my experience.

The customer wanted to upgrade their database from PostgreSQL 8.4 to v9.2. They had about 400 tables and wanted one set for each table, hence about 400 sets.

Below are the steps I took. Just for convenience, I'm using 10 tables/sets to explain.

Create tables using below script in source(8.4) and target(9.3) databases:
source=# select 'create table slony_tab_'||generate_series(1,10)||'(t int primary key);';

Inserted values using below script in source database:
source=# select 'insert into slony_tab_'||a||' values (generate_series(1,100));' from generate_series(1,10) a;


Configure Slony using below scripts:

1. Init cluser script
#################################################################################################

cluster name = shadow;
node 1 admin conninfo='host=127.0.0.1 dbname=source user=postgres port=5434';
node 2 admin conninfo='host=127.0.0.1 dbname=target user=postgres port=5432';
init cluster (id = 1 , comment = 'Primary Node For the Slave postgres');
#Setting Store Nodes ...
store node (id = 2, event node = 1 , comment = 'Slave Node For The Primary postgres');
#Setting Store Paths ...
echo  'Stored all nodes in the slony catalogs';
store path(server = 1 , client = 2, conninfo = 'host=127.0.0.1 dbname=source user=postgres port=5434');
store path(server = 2, client = 1 , conninfo = 'host=127.0.0.1 dbname=target user=postgres port=5432');
echo  'Stored all Store Paths for Failover and Switchover into slony catalogs ..';

#################################################################################################
2. Create Set Script
#################################################################################################

cluster name = shadow;
node 1 admin conninfo='host=127.0.0.1 dbname=source user=postgres port=5434';
node 2 admin conninfo='host=127.0.0.1 dbname=target user=postgres port=5432';
try { create set (id = 1 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set1 for upgrade!'; exit 1;}
try { create set (id = 2 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set2 for upgrade!'; exit 1;}
try { create set (id = 3 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set3 for upgrade!'; exit 1;}
try { create set (id = 4 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set4 for upgrade!'; exit 1;}
try { create set (id = 5 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set5 for upgrade!'; exit 1;}
try { create set (id = 6 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set6 for upgrade!'; exit 1;}
try { create set (id = 7 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set7 for upgrade!'; exit 1;}
try { create set (id = 8 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set8 for upgrade!'; exit 1;}
try { create set (id = 9 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set9 for upgrade!'; exit 1;}
try { create set (id = 10 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set10 for upgrade!'; exit 1;}
set add table (set id = 1 ,origin = 1 , id = 1, full qualified name = 'public.slony_tab_1', comment = 'Table slony_tab_1 with primary key');
set add table (set id = 2 ,origin = 1 , id = 2, full qualified name = 'public.slony_tab_2', comment = 'Table slony_tab_2 with primary key');
set add table (set id = 3 ,origin = 1 , id = 3, full qualified name = 'public.slony_tab_3', comment = 'Table slony_tab_3 with primary key');
set add table (set id = 4 ,origin = 1 , id = 4, full qualified name = 'public.slony_tab_4', comment = 'Table slony_tab_4 with primary key');
set add table (set id = 5 ,origin = 1 , id = 5, full qualified name = 'public.slony_tab_5', comment = 'Table slony_tab_5 with primary key');
set add table (set id = 6 ,origin = 1 , id = 6, full qualified name = 'public.slony_tab_6', comment = 'Table slony_tab_6 with primary key');
set add table (set id = 7 ,origin = 1 , id = 7, full qualified name = 'public.slony_tab_7', comment = 'Table slony_tab_7 with primary key');
set add table (set id = 8 ,origin = 1 , id = 8, full qualified name = 'public.slony_tab_8', comment = 'Table slony_tab_8 with primary key');
set add table (set id = 9 ,origin = 1 , id = 9, full qualified name = 'public.slony_tab_9', comment = 'Table slony_tab_9 with primary key');
set add table (set id = 10 ,origin = 1 , id = 10, full qualified name = 'public.slony_tab_10', comment = 'Table slony_tab_10 with primary key');

#################################################################################################

3. Starting Slon Processes
#################################################################################################
/opt/PostgreSQL/9.2/bin/slon -s 1000 -d2 shadow 'host=127.0.0.1 dbname=source user=postgres port=5434' > /tmp/node1.log 2>&1 &
/opt/PostgreSQL/8.4/bin/slon -s 1000 -d2 shadow 'host=127.0.0.1 dbname=target user=postgres port=5432' > /tmp/node2.log 2>&1 &

#################################################################################################

4. Subscribing the sets.
#################################################################################################
cluster name = shadow;
node 1 admin conninfo='host=127.0.0.1 dbname=source user=postgres port=5434';
node 2 admin conninfo='host=127.0.0.1 dbname=target user=postgres port=5432';
try { subscribe set (id = 1, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 1';
try { subscribe set (id = 2, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 2';
try { subscribe set (id = 3, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 3';
try { subscribe set (id = 4, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 4';
try { subscribe set (id = 5, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 5';
try { subscribe set (id = 6, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 6';
try { subscribe set (id = 7, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 7';
try { subscribe set (id = 8, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 8';
try { subscribe set (id = 9, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 9';
try { subscribe set (id = 10, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 10';
#################################################################################################

Till here it is normal, I mean, all can do very easily.... and of-course many sources you can find. However switchover.. failover !!

If you try to insert values in tables of "target" database, it throws below error:
target=# insert into slony_tab_1 values (11);
ERROR:  Slony-I: Table slony_tab_1 is replicated and cannot be modified on a subscriber node - role=0

Ah, its time to SwitchOver and FailOver...

Let us start with Switch-Over script --
cluster name = shadow;
node 1 admin conninfo='host=127.0.0.1 dbname=source user=postgres port=5434';
node 2 admin conninfo='host=127.0.0.1 dbname=target user=postgres port=5432';
lock set (id = 1, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 1, old origin = 1, new origin = 2); echo 'Set 1 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 2, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 2, old origin = 1, new origin = 2); echo 'Set 2 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 3, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 3, old origin = 1, new origin = 2); echo 'Set 3 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 4, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 4, old origin = 1, new origin = 2); echo 'Set 4 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 5, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 5, old origin = 1, new origin = 2); echo 'Set 5 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 6, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 6, old origin = 1, new origin = 2); echo 'Set 6 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 7, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 7, old origin = 1, new origin = 2); echo 'Set 7 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 8, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 8, old origin = 1, new origin = 2); echo 'Set 8 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 9, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 9, old origin = 1, new origin = 2); echo 'Set 9 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 10, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 10, old origin = 1, new origin = 2); echo 'Set 10 Has Been Moved From Origin Node 1 To 2 ';

Executing this script gives you this:
-bash-4.1$ /opt/PostgreSQL/9.2/bin/slonik /tmp/Switchover_script.slonik
/tmp/Switchover_script.slonik:4: Set 1 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:5: Set 2 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:6: Set 3 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:7: Set 4 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:8: waiting for event (1,5000000211) to be confirmed on node 2
/tmp/Switchover_script.slonik:8: Set 5 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:9: Set 6 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:10: Set 7 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:11: Set 8 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:12: waiting for event (1,5000000224) to be confirmed on node 2
/tmp/Switchover_script.slonik:12: Set 9 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:13: Set 10 Has Been Moved From Origin Node 1 To 2

-- Now try to insert values in tables of "target" database.. that should allow you.. :-)
target=# insert into slony_tab_1 values(21);
INSERT 0 1

Wow, its done.. isn't it easy.. !!
And now go ahead with failover.. its yours :P
Below is the script for FailOver...
cluster name = shadow;
node 1 admin conninfo='host=127.0.0.1 dbname=source user=postgres port=5434';
node 2 admin conninfo='host=127.0.0.1 dbname=target user=postgres port=5432';
try {failover (id = 1, backup node = 2 );} on error { echo 'Failure Of The Failover For The Set 1  to 2 ';exit 1; }echo 'Failover Has been performed from 1 to 2';

Check if replication is happening..it should not !
so we are done with Switchover and Failover..
Hope this helps to someone. Any comments or suggestion would be appreciated !


Saturday, 5 July 2014

Updating pg_cast helps, however sometimes !!

I've seen one of our customer is migrating a table from SQL Server to PostgreSQL using EnterpriseDB's Migration ToolKit.  This table has a boolean datatype column. In migration process, MTK converts datatype "boolean" to "bit" in PostgreSQL and the process was taking 6 hrs to complete. Customer wanted to change the datatype from "Bit" to "Integer" and alter command for changing type was taking another 6 hrs in PostgreSQL. If he migrates only structure to PostgreSQL first, and then change the type to "Integer" from "Bit", then it does not allow you to load the data with below error. If it allows, it takes only 6 hrs as no need of alter the type after data load.

ERROR: column "hidehelm" is of type integer but expression is of type boolean
 Hint: You will need to rewrite or cast the expression.

So I found a work around to type cast from "boolean" to "Integer" implicitly by updating "pg_cast" table as below. By this, he can directly load the boolean data into integer column which saves the time of altering the type from BIT to Integer after migrating.

postgres=# insert into tarik values (1::boolean);
ERROR:  column "t" is of type integer but expression is of type boolean
LINE 1: insert into tarik values (1::boolean);
HINT:  You will need to rewrite or cast the expression.
postgres=# select * from pg_cast where castsource='boolean'::regtype and casttarget='int4'::regtype;
castsource | casttarget | castfunc | castcontext | castmethod
------------+------------+----------+-------------+------------
         16 |         23 |     2558 | e           | f
(1 row)
postgres=# update pg_cast set castcontext ='i' where castsource='boolean'::regtype and casttarget='int4'::regtype;
UPDATE 1
postgres=# select * from pg_cast where castsource='boolean'::regtype and casttarget='int4'::regtype;
 castsource | casttarget | castfunc | castcontext | castmethod 
------------+------------+----------+-------------+------------
         16 |         23 |     2558 | i           | f
(1 row)
postgres=# insert into tarik values (1::boolean);
INSERT 0 1

Irrespective of any side effects(which I'm not aware of) of this workaround, this worked and migration had take only 6 hrs. Of-course, customer rolled back this setting after migrating the table. Updating catalogs is very dangerous, so might be I should have concentrated on how to reduce the time of ALTER command after migration?, anyways, it worked, so I was happy !! ;-)

Thanks for any suggestions/comments.