From: | Greg Stark <greg(dot)stark(at)enterprisedb(dot)com> |
---|---|
To: | Aidan Van Dyk <aidan(at)highrise(dot)ca> |
Cc: | "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Brian Hurt <bhurt(at)janestcapital(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Block-level CRC checks |
Date: | 2008-10-02 17:36:23 |
Message-ID: | BB469073-64E8-4CC8-A8E3-0672242D5347@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2 Oct 2008, at 05:51 PM, Aidan Van Dyk <aidan(at)highrise(dot)ca> wrote:
> So if PG currently doesn't care about the hit-bits being updated,
> during
> the write, then why should introducing a double-buffer introduce the a
> torn-page problem Tom mentions? I admit, I'm fishing for information
> from those in the know, because I haven't been looking at the code
> long
> enough (or all of it enough) to to know all the ins-and-outs...
It's not the buffeting it's the checksum. The problem arises if a page
is read in but no wal logged modifications are done against it. If a
hint bit is modified it won't be wal logged but the page is marked
dirty.
When we write the page there's a chance only part of the page actually
makes it to disk if the system crashes before the whole page is flushed.
Wal logged changes are safe because of full_page_writes. Hint bits are
safe because either the old or the new value will be on disk and we
don't care which. It doesn't matter if some hint bits are set and some
aren't.
However the checksum won't match because the checksum will have been
calculated on the whole block and part of it was never written.
Writing this explanation did bring to mind one solution which we had
already discussed for other reasons: not marking blocks dirty after
hint bit setting.
Alternatively if we detect a block is dirty but the lsn is older than
the last checkpoint is that the only time we need to worry? Then we
could either discard the writes or generate a noop wal log record just
for the full page write in that case.
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2008-10-02 17:38:06 | Re: Block-level CRC checks |
Previous Message | Jonah H. Harris | 2008-10-02 17:31:02 | Re: Block-level CRC checks |