Data corruption

From: Konrad Garus <konrad(dot)garus(at)gmail(dot)com>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Data corruption
Date: 2021-07-21 16:14:21
Message-ID: CAGHa9HXONb+ayki84FrNW1cVLuWrkv_=1q+HAngq923GBvnUgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hello,

we have a database that apparently got corrupted in a crash a few months
ago, long enough that there is no "good" backup.

There are at least two tables occasionally emitting errors like this:

ERROR: invalid page header in block 39684 of relation base/12926/32397
while performing SQL query: SQL DML: insert into foo (a, b, c) values (?,
?, ?) | PARAMETERS: foo, bar, baz: org.postgresql.util.PSQLException:
ERROR: invalid page header in block 39684 of relation base/12926/32397

The files in question (e.g. 32397) correspond to the table itself, not an
index. The larger of these tables has two files (32397 & 32397.1), almost
2G total.

To make things worse, vacuum ends with:

ERROR: could not access status of transaction 2828785257
DETAIL: Could not open file "pg_clog/0A89": No such file or directory.

That file is nowhere to be found.

When I do this:

set zero_damaged_pages = on;
select count(*) from foo

It shows 3 errors like this:

WARNING: invalid page header in block 78550 of relation base/12926/31642;
zeroing out page
WARNING: invalid page header in block 78551 of relation base/12926/31642;
zeroing out page
WARNING: invalid page header in block 78552 of relation base/12926/31642;
zeroing out page

One idea I found is to generate a zero file for the missing clog, set
zero_damaged_pages=on, then vacuum or recreate the table. Looks like it
might work, but I'm trying to understand the implications.

1. When I generate a zero clog file like this, what impact will it have on
everything? Can it break anything else in the database, introduce some kind
of inconsistency with other (correctly committed) table data?

2. How can I gauge how much actual data (e.g. how many rows) will be lost
by doing this (zero clog file + vacuum/recreate table)?

3. Is there some place I can look to reliably tell how many rows the table
should have, so I can compare to how many I can read after setting
zero_damaged_pages?

4. Is there any way at all to recover/repair the corrupted data at this
point?

--
Konrad Garus

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Vijaykumar Jain 2021-07-21 16:32:05 Re: Data corruption
Previous Message Yambu 2021-07-21 16:03:45 Re: Explain out put