Re: multiple tables got corrupted

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Vasu Madhineni <vasumdba1515(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: multiple tables got corrupted
Date: 2020-09-15 09:36:02
Message-ID: CABUevEya5XGeX0WRV+iEyXtDPhR1EzXwLS-RE6jdgV03Q42+YQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Sep 15, 2020 at 11:15 AM Vasu Madhineni <vasumdba1515(at)gmail(dot)com>
wrote:

> Hi All,
>
> In one of my postgres databases multiple tables got corrupted and followed
> the below steps but still the same error.
>
> 1.SET zero_damaged_pages = on
> 2. VACUUM ANALYZE, VACUUM FULL
> but still same error.
>

That is a very destructive first attempt. I hope you took a full disk-level
backup of the database before you did that, as it can ruin your chances for
forensics and data recovery for other issues.

moh_fa=# VACUUM FULL;
> ERROR: could not read block 9350 in file "base/1156523/1270812":
> Input/output error
>
> Tried to take backup of tables with pg_dump but same error. files exist
> physically in base location.
>
> How to proceed on this, no backup to restore.
>
>
This is clearly some sort of disk error, and with no backups to restore you
will definitely be losing data.

I'd start by figuring out which tables have no corruption and do work, and
back those up (with pg_dump for example) as soon as possible to a different
machine -- since it's not exactly unlikely that further disk errors will
appear.

Once you've done that, identify the tables, and then try to do partial
recovery. For example, if you look at the file 1270812, how big it is?
PostgreSQL is failing to read block 9350 which is 76595200 bytes into the
file. If this is at the very end of the file, you can for example try to
get the data out until that point with LIMIT. If it's in the middle of the
file, it gets more ticky, but similar approaches can be done.

Also, unless you are running with data checksums enabled, I wouldn't fully
trust the data in the tables that you *can* read either. Since you clearly
have disk issues, they may have caused corruption elsewhere as well, so
whatever verification you can do against other tables, you should do as
well.

You'll of course also want to check any kernel logs or storage system logs
to see if they can give you a hint as to what happened, but they are
unlikely to actually give you something that will help you fix the problem.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vasu Madhineni 2020-09-15 10:46:34 Re: multiple tables got corrupted
Previous Message Vasu Madhineni 2020-09-15 09:00:02 multiple tables got corrupted