Re: Would like to below scenario is possible for getting page/block corruption

From: Sreekanth Palluru <sree4pg(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-general(at)postgresql(dot)org>
Subject: Re: Would like to below scenario is possible for getting page/block corruption
Date: 2016-12-09 03:09:05
Message-ID: CAP+fnpiz8udmeT+bixPSBXStHVfFJ7JXmAEQw9XtLEkkAb_g7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-general

Michael,
Thanks for your prompt reply

In my environment those two parameters are enabled . Just give you brief of
PG database envornment
Version 9.2.4.1
Windows 7 Professional SP1
fsync=on
full_page_writes=on
wal_sync_method=open_datasync

My Customer is into building Cancer related systems and we ship Dell
systems with our software image contains PG. Few of the customers are
facing corruption issues say around 5% .
We are in process of reproducing the issue , since there are different
variables involved in reproducing issue like Dell HW, Software image
versions, Application versions, write-cache settings RAID/Disk, RAID
controllers with no backup and power failures etc , I am trying to
understand is there possibility that PG can end up in having corrupted
blocks due to system crash.

1)As I understand fsycn will write the block from memory to disk and block
just after step 4) would have written disk assuming disk cache did not lie
2)and assume that full_page_writes=on has dumped the whole 8k block into WAL
before it updates block i.e. after step 2) and before 3)
3) if crash happens after step4) , since there is no PageHeader data ,
after system restarts PG will complain that it is corrupted block or
invalid header

Please correct me if my understanding about play fsync and full_page_writes
are correct ? if so , I see that there is possibility getting corruptions
whenever PG extends a relation and crash happens just after step 4)

I am not sure will the same applicable to existing page (not a new page)
and how it handles if there is PageHeader available as part of
full_page_writes, will same corruption can be happen or will PG can recover
database as I am not sure
recovery process can update the PageHeader from WAL records it wrote recptr
as part of step 4) during the recovery process .

-Sreekanth

On Fri, Dec 9, 2016 at 12:44 PM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

> (Please top-post that's annoying)
>
> On Fri, Dec 9, 2016 at 10:28 AM, Sreekanth Palluru <sree4pg(at)gmail(dot)com>
> wrote:
> > Can I generalize that, if after step 4) page ( new page or old page)
> got
> > written disk from buffer and crash happens between step 4) and 5) we
> > always get
> > block corruption issues with Postgres which can only be recovered by
> setting
> > zero_damaged_pages if we just have pg_dump backups and we are OK lose
> data
> > in the affected blocks?
> >
> > I am also looking at ways of reproducing the issue ? appreciate your
> advice
> > on it ?
>
> Postgres is designed to avoid such corruption problems if
> full_page_writes and fsync are enabled, that's a base stone of its
> reliability. If you can create a self-contained scenario able to
> reproduce a failure, that could be treated as a Postgres bug, but you
> are giving no evidence that this is the case.
> --
> Michael
>

--
Regards
Sreekanth

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Sreekanth Palluru 2016-12-09 03:21:32 Re: Would like to below scenario is possible for getting page/block corruption
Previous Message Michael Paquier 2016-12-09 01:44:23 Re: Would like to below scenario is possible for getting page/block corruption

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2016-12-09 03:14:56 Re: Importing SQLite database
Previous Message Steve Litt 2016-12-09 03:08:55 Re: Looking for an online mentor