Re: CRCs

From: ncm(at)zembu(dot)com (Nathan Myers)
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: CRCs
Date: 2001-01-13 10:47:53
Message-ID: 20010113024753.B7991@store.zembu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 12, 2001 at 04:38:37PM -0800, Mikheev, Vadim wrote:
> Example.
> 1. Tuple was inserted into index.
> 2. Looking for free buffer bufmgr decides to write index block.
> 3. Following WAL core rule bufmgr first calls XLogFlush() to write
> and fsync log record related to index tuple insertion.
> 4. *Believing* that log record is on disk now (after successful fsync)
> bufmgr writes index block.
>
> If log record was not really flushed on disk in 3. but on-disk image of
> index block was updated in 4. and system crashed after this then after
> restart recovery you'll have unlawful index tuple pointing to where?
> Who knows! No guarantee that corresponding heap tuple was flushed on
> disk.
>
> Isn't database corrupted now?

Note, I haven't read the WAL code, so much of what I've said is based
on what I know is and isn't possible with logging, rather than on
Vadim's actual choices. I know it's *possible* to implement a logging
database which can maintain consistency without need for strict write
ordering; but without strict write ordering, it is not possible to
guarantee durable transactions. That is, after a power outage, such
a database may be guaranteed to recover uncorrupted, but some number
(>= 0) of the last few acknowledged/committed transactions may be lost.

Vadim's implementation assumes strict write ordering, so that (e.g.)
with IDE disks a corrupt database is possible in the event of a power
outage. (Database and OS crashes don't count; those don't keep the
blocks from finding their way from on-disk buffers to disk.) This is
no criticism; it is more efficient to assume strict write ordering,
and a database that can lose (the last few) committed transactions
has limited value.

To achieve disk write-order independence is probably not a worthwhile
goal, but for systems that cannot provide strict write ordering (e.g.,
most PCs) it would be helpful to be able to detect that the database
has become corrupted. In Vadim's example above, if the index were to
contain not only the heap blocks' numbers, but also their CRCs, then
the corruption could be detected when the index is used. When the
block is read in, its CRC is checked, and when it is referenced via
the index, the two CRC values are simply compared and the corruption
is revealed.

On a machine that does provide strict write ordering, the CRCs in the
index might be unnecessary overhead, but they also provide cross-checks
to help detect corruption introduced by bugs and whatnot.

Or maybe I don't know what I'm talking about.

Nathan Myers
ncm(at)zembu(dot)com

In response to

  • RE: CRCs at 2001-01-13 00:38:37 from Mikheev, Vadim

Responses

  • Re: CRCs at 2001-01-13 17:49:34 from Tom Lane

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantinos Agouros 2001-01-13 11:38:39 diffs available?
Previous Message Nathan Myers 2001-01-13 09:36:47 Re: CRCs