Saturday, 5 July 2014

Updating pg_cast helps, however sometimes !!

I've seen one of our customer is migrating a table from SQL Server to PostgreSQL using EnterpriseDB's Migration ToolKit.  This table has a boolean datatype column. In migration process, MTK converts datatype "boolean" to "bit" in PostgreSQL and the process was taking 6 hrs to complete. Customer wanted to change the datatype from "Bit" to "Integer" and alter command for changing type was taking another 6 hrs in PostgreSQL. If he migrates only structure to PostgreSQL first, and then change the type to "Integer" from "Bit", then it does not allow you to load the data with below error. If it allows, it takes only 6 hrs as no need of alter the type after data load.

ERROR: column "hidehelm" is of type integer but expression is of type boolean
 Hint: You will need to rewrite or cast the expression.

So I found a work around to type cast from "boolean" to "Integer" implicitly by updating "pg_cast" table as below. By this, he can directly load the boolean data into integer column which saves the time of altering the type from BIT to Integer after migrating.

postgres=# insert into tarik values (1::boolean);
ERROR:  column "t" is of type integer but expression is of type boolean
LINE 1: insert into tarik values (1::boolean);
HINT:  You will need to rewrite or cast the expression.
postgres=# select * from pg_cast where castsource='boolean'::regtype and casttarget='int4'::regtype;
castsource | casttarget | castfunc | castcontext | castmethod
------------+------------+----------+-------------+------------
         16 |         23 |     2558 | e           | f
(1 row)
postgres=# update pg_cast set castcontext ='i' where castsource='boolean'::regtype and casttarget='int4'::regtype;
UPDATE 1
postgres=# select * from pg_cast where castsource='boolean'::regtype and casttarget='int4'::regtype;
 castsource | casttarget | castfunc | castcontext | castmethod 
------------+------------+----------+-------------+------------
         16 |         23 |     2558 | i           | f
(1 row)
postgres=# insert into tarik values (1::boolean);
INSERT 0 1

Irrespective of any side effects(which I'm not aware of) of this workaround, this worked and migration had take only 6 hrs. Of-course, customer rolled back this setting after migrating the table. Updating catalogs is very dangerous, so might be I should have concentrated on how to reduce the time of ALTER command after migration?, anyways, it worked, so I was happy !! ;-)

Thanks for any suggestions/comments.

Friday, 4 July 2014

Oops I corrupted my table, of-course just to recover salvaged data.


The way I started the title might be confusing, "I corrupted my table", so everyone starts with "Crap !! why did you do that !!, ...", so just to justify it.......
I see many customers coming for recovering the corrupted tables without any backup. In such cases, hard to recover the tables completely and it needs lot of work, however we can recover salvaged data if they dont care about corrupted rows.

Let me corrupt the table first.. :-)

 I created a million-row table called "to_be_damaged"
postgres=# select count(*) from to_be_damaged ;
  count 
---------
 1000000
(1 row)
postgres=# select relfilenode,relname from pg_class where relname='to_be_damaged';
relfilenode |    relname   
-------------+---------------
       461257 | to_be_damaged

(1 row)
I've used "hexedit" to damage it. Open relfilenode file from OS level using hexedit and try picking a line which is the start of an 8K boundary and typing hex DE AD BE EF across it.

postgres=# select count(*) from to_be_damaged ;
ERROR:  invalid page in block 0 of relation base/12896/461257

Now create an identical table "salvaged" to recover salvaged data from "to_be_damaged" table.
postgres=# create table salvaged(t int);
CREATE TABLE
Prepared below function which copies the rows which are still salvageable:
create or replace function salvage_damaged()
  returns void
  language plpgsql
as $$
declare
  pageno int;
  tupno int;
  pos tid;
begin
  <<pageloop>>
  for pageno in 0..35930 loop  -- pg_class.relpages for the damaged table
    for tupno in 1..1000 loop
      pos = ('(' || pageno || ',' || tupno || ')')::tid;
      begin
        insert into salvaged select * from to_be_damaged where ctid = pos;
      exception
        when sqlstate 'XX001' then
          raise warning 'skipping page %', pageno;
          continue pageloop;
        when others then
          raise warning 'skipping row %', pos;
      end;
    end loop;
  end loop;
end;
$$;

Now run the function to copy salvagable rows:
postgres# select salvage_damaged();
WARNING: skipping page 0
salvage_damaged
-----------------

(1 row)
postgres=# select count(*) from salvaged ;
count
----------
12999815
(1 row)

postgres=# select 13000000-12999815;
?column?
----------
185
(1 row)

I hope it helps someone. Thanks for reading.