Re: Checksums, state of play

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Checksums, state of play
Date: 2012-03-06 20:07:23
Message-ID: CA+TgmobP4g-zfEmThiYG=8h3wGVKTqthwXTWdj4qZd9UHiJwDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 6, 2012 at 12:50 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> I think the "turning checksums on/off/on/off" is really a killer
> problem, and obviously many of the actions needed to make it safe make
> the checksum feature itself less useful.
>
> One crazy idea would be to have a checksum _version_ number somewhere on
> the page and in pg_controldata.  When you turn on checksums, you
> increment that value, and all new checksum pages get that checksum
> version;  if you turn off checksums, we just don't check them anymore,
> but they might get incorrect due to a hint bit write and a crash.  When
> you turn on checksums again, you increment the checksum version again,
> and only check pages having the _new_ checksum version.
>
> Yes, this does add additional storage requirements for the checksum, but
> I don't see another clean option.  If you can spare one byte, that gives
> you 255 times to turn on checksums;   after that, you have to
> dump/reload to use the checksum feature.

I don't see what problem that solves. It's just taking the problem we
already have and a new failure mode (out of checksum versions) on top
of it. If you see a page with checksum version 153 and the current
checksum version is 152, then you know that either (a) it is the
result of a previous iteration of turning checksums on or (b) the
checksum version number got corrupted. This is the exact same problem
we have with using a PD_HAS_CHECKSUM bit. If the bit is not set, then
you know that either (a) it hasn't been checksummed yet or (b) the bit
got corrupted. In either case, a single poorly placed bit-flip gives
rise to the exact same confusion.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-03-06 20:28:12 Re: foreign key locks, 2nd attempt
Previous Message Simon Riggs 2012-03-06 19:58:22 Re: foreign key locks, 2nd attempt