Re: multiple tables got corrupted

From: Jerry Sievers <gsievers19(at)comcast(dot)net>
To: Vasu Madhineni <vasumdba1515(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: multiple tables got corrupted
Date: 2020-09-18 14:07:42
Message-ID: 87d02jkwk1.fsf@jsievers.enova.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Vasu Madhineni <vasumdba1515(at)gmail(dot)com> writes:

> Hi Magnus,
>
> Thanks for your update.
> To identify the number of tables corrupted in the database if I run
> below command, Will any impact on other tables in the production
> environment. 
>
> "pg_dump -f /dev/null database"

Consider using pg_dump or any other means to dump *each* table
individually.

pg_dump is going to abort on the first case of corruption in any table
that results in a read error on full scan, thus in a scenario where
multiple corrupt tables is likely, you're not going to get too far
w/monolithic approach.

>
> Thanks in advance.
>
> Regards,
> Vasu Madhineni
>
> On Fri, Sep 18, 2020 at 3:42 PM Magnus Hagander <magnus(at)hagander(dot)net>
> wrote:
>
> That depends on what the problem is and how they fix it. Most
> likely yes -- especially since if you haven't enabled data
> checksums you won't *know* if things are OK or not. So I'd
> definitely recommend it even if things *look* OK.
>
> //Magnus
>
>
> On Wed, Sep 16, 2020 at 5:06 AM Vasu Madhineni <
> vasumdba1515(at)gmail(dot)com> wrote:
>
> I could see block read I/O errors in /var/log/syslog. if
> those error fixed by OS team, will it require recovery.
>
> Also can i use LIMIT and OFFSET to locate corrupted rows?
>
> Thanks in advance
>
> Regards,
> Vasu Madhineni
>
> On Wed, Sep 16, 2020, 01:58 Magnus Hagander <
> magnus(at)hagander(dot)net> wrote:
>
> Try reading them "row by row" until it breaks. That is,
> SELECT * FROM ... LIMIT 1, then LIMIT 2 etc. For more
> efficiency use a binary search starting at what seems
> like a reasonable place looking at the size of the table
> vs the first failed block to make it faster, but the
> principle is the same. Once it fails, you've found a
> corrupt block...
>
> //Magnus
>
>
> On Tue, Sep 15, 2020 at 12:46 PM Vasu Madhineni <
> vasumdba1515(at)gmail(dot)com> wrote:
>
> Is it possible to identify which rows are corrupted
> in particular tables.
>
> On Tue, Sep 15, 2020 at 5:36 PM Magnus Hagander <
> magnus(at)hagander(dot)net> wrote:
>
>
>
> On Tue, Sep 15, 2020 at 11:15 AM Vasu Madhineni <
> vasumdba1515(at)gmail(dot)com> wrote:
>
> Hi All,
>
> In one of my postgres databases multiple
> tables got corrupted and followed the below
> steps but still the same error.
>
> 1.SET zero_damaged_pages = on
> 2. VACUUM ANALYZE, VACUUM FULL
> but still same error. 
>
>
>
> That is a very destructive first attempt. I hope
> you took a full disk-level backup of the database
> before you did that, as it can ruin your chances
> for forensics and data recovery for other issues.
>
>
>
> moh_fa=# VACUUM FULL;
> ERROR:  could not read block 9350 in file
> "base/1156523/1270812": Input/output error
>
> Tried to take backup of tables with pg_dump
> but same error. files exist physically in
> base location.
>
> How to proceed on this, no backup to restore.
>
>
>
> This is clearly some sort of disk error, and with
> no backups to restore you will definitely be
> losing data.
>
> I'd start by figuring out which tables have no
> corruption and do work, and back those up (with
> pg_dump for example) as soon as possible to a
> different machine -- since it's not exactly
> unlikely that further disk errors will appear.
>
> Once you've done that, identify the tables, and
> then try to do partial recovery. For example, if
> you look at the file 1270812, how big it is?
> PostgreSQL is failing to read block 9350 which
> is 76595200 bytes into the file. If this is at
> the very end of the file, you can for example try
> to get the data out until that point with LIMIT.
> If it's in the middle of the file, it gets more
> ticky, but similar approaches can be done. 
>
> Also, unless you are running with data checksums
> enabled, I wouldn't fully trust the data in the
> tables that you *can* read either. Since you
> clearly have disk issues, they may have caused
> corruption elsewhere as well, so whatever
> verification you can do against other tables, you
> should do as well.
>
>
> You'll of course also want to check any kernel
> logs or storage system logs to see if they can
> give you a hint as to what happened, but they are
> unlikely to actually give you something that will
> help you fix the problem.
>
>
>
>

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres(dot)consulting(at)comcast(dot)net

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Igor Korot 2020-09-18 17:29:14 How to write such a query
Previous Message Vasu Madhineni 2020-09-18 08:16:30 Re: multiple tables got corrupted