The way I started the title might be confusing, "I corrupted my table", so everyone starts with "Crap !! why did you do that !!, ...", so just to justify it.......
I see many customers coming for recovering the corrupted tables without any backup. In such cases, hard to recover the tables completely and it needs lot of work, however we can recover salvaged data if they dont care about corrupted rows.
Let me corrupt the table first.. :-)
I created a million-row table called "to_be_damaged"
postgres=# select count(*) from to_be_damaged ; count --------- 1000000 (1 row) postgres=# select relfilenode,relname from pg_class where relname='to_be_damaged'; relfilenode | relname -------------+--------------- 461257 | to_be_damaged (1 row)I've used "hexedit" to damage it. Open relfilenode file from OS level using hexedit and try picking a line which is the start of an 8K boundary and typing hex DE AD BE EF across it.
postgres=# select count(*) from to_be_damaged ; ERROR: invalid page in block 0 of relation base/12896/461257
Now create an identical table "salvaged" to recover salvaged data from "to_be_damaged" table.
postgres=# create table salvaged(t int); CREATE TABLEPrepared below function which copies the rows which are still salvageable:
create or replace function salvage_damaged() returns void language plpgsql as $$ declare pageno int; tupno int; pos tid; begin <<pageloop>> for pageno in 0..35930 loop -- pg_class.relpages for the damaged table for tupno in 1..1000 loop pos = ('(' || pageno || ',' || tupno || ')')::tid; begin insert into salvaged select * from to_be_damaged where ctid = pos; exception when sqlstate 'XX001' then raise warning 'skipping page %', pageno; continue pageloop; when others then raise warning 'skipping row %', pos; end; end loop; end loop; end; $$;
Now run the function to copy salvagable rows:
postgres# select salvage_damaged(); WARNING: skipping page 0 salvage_damaged ----------------- (1 row) postgres=# select count(*) from salvaged ; count ---------- 12999815 (1 row) postgres=# select 13000000-12999815; ?column? ---------- 185 (1 row)
I hope it helps someone. Thanks for reading.
Thanks, thanks a great idea, but here's a slight mod that makes the function a little more flexible:
ReplyDeleteCREATE OR REPLACE FUNCTION salvage_damaged_table(bad_table name)
returns void
language plpgsql
AS $$
DECLARE
bad_table ALIAS FOR $1;
totpages int;
tottuples bigint;
pageno int;
tupno int;
pos tid;
BEGIN
SELECT relpages, reltuples INTO totpages, tottuples
FROM pg_class
WHERE relname = quote_ident(bad_table)
AND relkind = 'r';
RAISE NOTICE 'totpages %, tottuples %', totpages::text, tottuples::text;
<>
for pageno in 0..totpages loop -- pg_class.relpages for the damaged table
for tupno in 1..tottuples loop
pos = ('(' || pageno || ',' || tupno || ')')::tid;
begin
insert into salvaged
select *
from my_bad_table -- <-- Replace with actual table name here.
where ctid = pos;
exception
when sqlstate 'XX001' then
raise warning 'skipping page %', pageno;
continue pageloop;
when others then
raise warning 'skipping row %, SQLSTATE %', pos, SQLSTATE::text;
end;
end loop;
end loop;
RETURN;
end;
$$;
Thanks for the suggestion, it looks cool !
Deletepostgres=# \i a.sql
DeleteCREATE FUNCTION
postgres=# select salvage_damaged_table('my_bad_table') ;
NOTICE: totpages 57712, tottuples 2433024
ERROR: invalid input syntax for type tid: "(0,65536)"
PL/pgSQL function salvage_damaged_table(character varying) line 20 at assignment
here :
for pageno in 0..totpages loop -- pg_class.relpages for the damaged table
for tupno in 1..tottuples loop
is wrong
tuple number can not be bigger than 65536
so
for pageno in 0..totpages loop -- pg_class.relpages for the damaged table
for tupno in 1..65535 loop
thanks
This comment has been removed by the author.
ReplyDeleteThanks Baji. Nice approach to get the data back from corrupted table.
ReplyDeleteThank you Dinesh !
DeleteGreat thanks Baji, very useful post!
ReplyDeletethanks Baji !!!
ReplyDeleteBaji , and how to fix the corrupted table ? I just see in blogs "Enable zero_damaged_pages and execute VACUUM FREEZE" ... nothing else
ReplyDeleteBaji , and how to fix the corrupted table? I just find "Enable zero_damaged_pages and execute VACUUM FREEZE" on blogs , nothing else
ReplyDeleteHi, "enabliling zero_damaged_pages skips all the damaged pages and gives you the salvaged data". To get back corrupted data, you might want to look at the blocks which are corrupted. However I have very less knowledge towards that though.
DeleteHi,
ReplyDeleteCan anyone tell me how to get the size of database in postgresql 8.0 version