Saturday 15 November 2014

Finally, its.... Slony !! (switchover and failover)


Slony is a great PostgreSQL replication tool and it’s used by many organizations. The tool works with different versions of Postgres, and it’s easy to upgrade with minimum downtime – sometimes with zero downtime!

Recently, I was working with a customer on a switchover and failover of Slony. I have seen a lot of bloggers post about installing and configuring Slony so I thought I would share my experience.

The customer wanted to upgrade their database from PostgreSQL 8.4 to v9.2. They had about 400 tables and wanted one set for each table, hence about 400 sets.

Below are the steps I took. Just for convenience, I'm using 10 tables/sets to explain.

Create tables using below script in source(8.4) and target(9.3) databases:
source=# select 'create table slony_tab_'||generate_series(1,10)||'(t int primary key);';

Inserted values using below script in source database:
source=# select 'insert into slony_tab_'||a||' values (generate_series(1,100));' from generate_series(1,10) a;


Configure Slony using below scripts:

1. Init cluser script
#################################################################################################

cluster name = shadow;
node 1 admin conninfo='host=127.0.0.1 dbname=source user=postgres port=5434';
node 2 admin conninfo='host=127.0.0.1 dbname=target user=postgres port=5432';
init cluster (id = 1 , comment = 'Primary Node For the Slave postgres');
#Setting Store Nodes ...
store node (id = 2, event node = 1 , comment = 'Slave Node For The Primary postgres');
#Setting Store Paths ...
echo  'Stored all nodes in the slony catalogs';
store path(server = 1 , client = 2, conninfo = 'host=127.0.0.1 dbname=source user=postgres port=5434');
store path(server = 2, client = 1 , conninfo = 'host=127.0.0.1 dbname=target user=postgres port=5432');
echo  'Stored all Store Paths for Failover and Switchover into slony catalogs ..';

#################################################################################################
2. Create Set Script
#################################################################################################

cluster name = shadow;
node 1 admin conninfo='host=127.0.0.1 dbname=source user=postgres port=5434';
node 2 admin conninfo='host=127.0.0.1 dbname=target user=postgres port=5432';
try { create set (id = 1 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set1 for upgrade!'; exit 1;}
try { create set (id = 2 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set2 for upgrade!'; exit 1;}
try { create set (id = 3 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set3 for upgrade!'; exit 1;}
try { create set (id = 4 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set4 for upgrade!'; exit 1;}
try { create set (id = 5 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set5 for upgrade!'; exit 1;}
try { create set (id = 6 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set6 for upgrade!'; exit 1;}
try { create set (id = 7 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set7 for upgrade!'; exit 1;}
try { create set (id = 8 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set8 for upgrade!'; exit 1;}
try { create set (id = 9 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set9 for upgrade!'; exit 1;}
try { create set (id = 10 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set10 for upgrade!'; exit 1;}
set add table (set id = 1 ,origin = 1 , id = 1, full qualified name = 'public.slony_tab_1', comment = 'Table slony_tab_1 with primary key');
set add table (set id = 2 ,origin = 1 , id = 2, full qualified name = 'public.slony_tab_2', comment = 'Table slony_tab_2 with primary key');
set add table (set id = 3 ,origin = 1 , id = 3, full qualified name = 'public.slony_tab_3', comment = 'Table slony_tab_3 with primary key');
set add table (set id = 4 ,origin = 1 , id = 4, full qualified name = 'public.slony_tab_4', comment = 'Table slony_tab_4 with primary key');
set add table (set id = 5 ,origin = 1 , id = 5, full qualified name = 'public.slony_tab_5', comment = 'Table slony_tab_5 with primary key');
set add table (set id = 6 ,origin = 1 , id = 6, full qualified name = 'public.slony_tab_6', comment = 'Table slony_tab_6 with primary key');
set add table (set id = 7 ,origin = 1 , id = 7, full qualified name = 'public.slony_tab_7', comment = 'Table slony_tab_7 with primary key');
set add table (set id = 8 ,origin = 1 , id = 8, full qualified name = 'public.slony_tab_8', comment = 'Table slony_tab_8 with primary key');
set add table (set id = 9 ,origin = 1 , id = 9, full qualified name = 'public.slony_tab_9', comment = 'Table slony_tab_9 with primary key');
set add table (set id = 10 ,origin = 1 , id = 10, full qualified name = 'public.slony_tab_10', comment = 'Table slony_tab_10 with primary key');

#################################################################################################

3. Starting Slon Processes
#################################################################################################
/opt/PostgreSQL/9.2/bin/slon -s 1000 -d2 shadow 'host=127.0.0.1 dbname=source user=postgres port=5434' > /tmp/node1.log 2>&1 &
/opt/PostgreSQL/8.4/bin/slon -s 1000 -d2 shadow 'host=127.0.0.1 dbname=target user=postgres port=5432' > /tmp/node2.log 2>&1 &

#################################################################################################

4. Subscribing the sets.
#################################################################################################
cluster name = shadow;
node 1 admin conninfo='host=127.0.0.1 dbname=source user=postgres port=5434';
node 2 admin conninfo='host=127.0.0.1 dbname=target user=postgres port=5432';
try { subscribe set (id = 1, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 1';
try { subscribe set (id = 2, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 2';
try { subscribe set (id = 3, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 3';
try { subscribe set (id = 4, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 4';
try { subscribe set (id = 5, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 5';
try { subscribe set (id = 6, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 6';
try { subscribe set (id = 7, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 7';
try { subscribe set (id = 8, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 8';
try { subscribe set (id = 9, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 9';
try { subscribe set (id = 10, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 10';
#################################################################################################

Till here it is normal, I mean, all can do very easily.... and of-course many sources you can find. However switchover.. failover !!

If you try to insert values in tables of "target" database, it throws below error:
target=# insert into slony_tab_1 values (11);
ERROR:  Slony-I: Table slony_tab_1 is replicated and cannot be modified on a subscriber node - role=0

Ah, its time to SwitchOver and FailOver...

Let us start with Switch-Over script --
cluster name = shadow;
node 1 admin conninfo='host=127.0.0.1 dbname=source user=postgres port=5434';
node 2 admin conninfo='host=127.0.0.1 dbname=target user=postgres port=5432';
lock set (id = 1, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 1, old origin = 1, new origin = 2); echo 'Set 1 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 2, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 2, old origin = 1, new origin = 2); echo 'Set 2 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 3, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 3, old origin = 1, new origin = 2); echo 'Set 3 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 4, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 4, old origin = 1, new origin = 2); echo 'Set 4 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 5, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 5, old origin = 1, new origin = 2); echo 'Set 5 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 6, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 6, old origin = 1, new origin = 2); echo 'Set 6 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 7, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 7, old origin = 1, new origin = 2); echo 'Set 7 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 8, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 8, old origin = 1, new origin = 2); echo 'Set 8 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 9, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 9, old origin = 1, new origin = 2); echo 'Set 9 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 10, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 10, old origin = 1, new origin = 2); echo 'Set 10 Has Been Moved From Origin Node 1 To 2 ';

Executing this script gives you this:
-bash-4.1$ /opt/PostgreSQL/9.2/bin/slonik /tmp/Switchover_script.slonik
/tmp/Switchover_script.slonik:4: Set 1 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:5: Set 2 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:6: Set 3 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:7: Set 4 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:8: waiting for event (1,5000000211) to be confirmed on node 2
/tmp/Switchover_script.slonik:8: Set 5 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:9: Set 6 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:10: Set 7 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:11: Set 8 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:12: waiting for event (1,5000000224) to be confirmed on node 2
/tmp/Switchover_script.slonik:12: Set 9 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:13: Set 10 Has Been Moved From Origin Node 1 To 2

-- Now try to insert values in tables of "target" database.. that should allow you.. :-)
target=# insert into slony_tab_1 values(21);
INSERT 0 1

Wow, its done.. isn't it easy.. !!
And now go ahead with failover.. its yours :P
Below is the script for FailOver...
cluster name = shadow;
node 1 admin conninfo='host=127.0.0.1 dbname=source user=postgres port=5434';
node 2 admin conninfo='host=127.0.0.1 dbname=target user=postgres port=5432';
try {failover (id = 1, backup node = 2 );} on error { echo 'Failure Of The Failover For The Set 1  to 2 ';exit 1; }echo 'Failover Has been performed from 1 to 2';

Check if replication is happening..it should not !
so we are done with Switchover and Failover..
Hope this helps to someone. Any comments or suggestion would be appreciated !


6 comments:

  1. Thanks for the wonderful explanation baaji :) Great Work

    ReplyDelete
    Replies
    1. Thanks Baji for such handy reference.

      I have a question here:

      Once failover is done, your slave becomes master (with read/write enabled). However, now there is no HA setup. You are left with only new master (old slave, as standalone machine). So, what are the upper level steps to add/bring new slave (old master) back in the replication?

      Thanks to answer my question.

      -Alok Mishra

      Delete
  2. Nice post baji..
    I have two questions here, plz if you could answer at your convenience:

    1> While running failover command on Slave (while doing failover), how does slave node communicate with master node (as Master node is down)? Or, Is it not required?

    2> Once failover is done, your slave becomes new master (with read/write enabled). However, now there is no HA setup. Means, you are only left with your new master (as a stand alone server). So, what are the upper level steps to add the older node or new node (as slave) to start the slony replication again?

    Thanks for your time to answer my questions.

    -Alok

    ReplyDelete
  3. Hi Baji,
    can we do failover on windows-7 using Slony tool?
    Yours suggestion would be highly appreciated.

    ReplyDelete