Re: Somewhat automated method of cleaning table of corrupt records for pg_dump

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Heiko Wundram <modelnine(at)modelnine(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Somewhat automated method of cleaning table of corrupt records for pg_dump
Date: 2012-10-22 20:34:20
Message-ID: 20121022203419.GA27021@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Oct 22, 2012 at 11:54:47AM +0200, Heiko Wundram wrote:
> If there's any other possibility of "out of the box" recovery -
> except writing myself a small script to walk all rows - I'd still be
> grateful for a hint.

Something that has worked for me in the past is:

$ SELECT ctid FROM table WHERE length(field) < 0;

This gives you a list of ctids (if it works) which you can delete. You
can also look for very large lengths. This works because length()
doesn't actually unpack the string, it just pulls the length.

It doesn't always work, it depends on the kind of corruption. You also
need to start at the leftmost text field and work forwards, because it
blows up while unpacking the tuples.

Otherwise you're back to doing things like:

$ SELECT sum(length(field || '')) FROM table OFFSET x LIMIT y;

And doing a bisect type algorithm to narrow down where it is. The
sum(length()) is so you throw away the output after checking field can
be extracted properly. Once you get close you start printing the ctids
and take a stab at the ctid of the broken row and delete it.

If your table is large the OFFSET/LIMIT get slow. Unfortunatly Postgres
doesn't understand clauses like 'ctid > (page,tuple)' to start scanning
at a particular spot in the table.

It's not automated, though it might not be hard to do.

Hope this helps,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
-- Arthur Schopenhauer

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Nikolas Everett 2012-10-22 21:55:07 9.1 to 9.2 requires a dump/reload?
Previous Message Jeff Janes 2012-10-22 20:26:03 Re: Plug-pull testing worked, diskchecker.pl failed