Quick Links

Re: Disk corruption detection

From:	Florian Weimer <fw(at)deneb(dot)enyo(dot)de>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Disk corruption detection
Date:	2006-06-12 17:55:22
Message-ID:	87irn6b6np.fsf@mid.deneb.enyo.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

* Jim C. Nasby:

>> Anyway, how would be the chances for PostgreSQL to detect such a
>> corruption on a heap or index data file? It's typically hard to
>> detect this at the application level, so I don't expect wonders. I'm
>> just curious if using PostgreSQL would have helped to catch this
>> sooner.
>
> I know that WAL pages are (or at least were) CRC'd, because there was
> extensive discussion around 32 bit vs 64 bit CRCs.

CRCs wouldn't help because the out-of-date copy has got a correct CRC.
That's why it's so hard to detect this problem at the application
level. Putting redundancy into rows doesn't help, for instance.

> There is no such check for data pages, although PostgreSQL has other
> ways to detect errors. But in a nutshell, if you care about your
> data, buy hardware you can trust.

All hardware can fail. 8-/

AFAIK, compare-on-read is the recommend measure to compensate for this
kind of failure. (The traditional recommendation also includes three
disks, so that you've got a tie-breaker.) It seems to me that
PostgreSQL's MVCC-related "don't directly overwrite data rows" policy
might help to expose this sooner than with direct B-tree updates.

In this particular case, we would have avoided the failure if we
properly monitored the disk subsystem (the failure was gradual).
Fortunately, it was just a test system, but it got me woried a bit.

In response to

Re: Disk corruption detection at 2006-06-12 15:53:46 from Jim C. Nasby

Responses

Re: Disk corruption detection at 2006-06-12 20:44:51 from Jim C. Nasby

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tom Lane	2006-06-12 18:04:27	Re: Ever increasing OIDs - gonna run out soon?
Previous Message	Florian Weimer	2006-06-12 17:41:46	Re: Fabian Pascal and RDBMS deficiencies in fully