Re: corrupt pages detected by enabling checksums

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: corrupt pages detected by enabling checksums
Date: 2013-04-30 12:55:50
Message-ID: CA+U5nMJxovcjpoZ7_K67eaSHANBWcW5RHQkj+zDEPUaEASrr_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 30 April 2013 13:34, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Apr 30, 2013 at 6:58 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On 9 April 2013 08:36, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>>
>>> 1. I believe that the issue I brought up at the end of this email:
>>>
>>> http://www.postgresql.org/message-id/1365035537.7580.380.camel@sussancws0025
>>>
>>> is a real issue. In lazy_vacuum_page(), the following sequence can
>>> happen when checksums are on:
>>>
>>> a. PageSetAllVisible
>>> b. Pass heap page to visibilitymap_set
>>> c. visibilitymap_set logs the heap page and sets the LSN
>>> d. MarkBufferDirty
>>>
>>> If a checkpoint happens between (c) and (d), then we have a problem. The
>>> fix is easy: just mark the heap buffer dirty first. There's another call
>>> site that looks like a potential problem, but I don't think it is. I
>>> simplified the code there to make it (hopefully) more clearly correct.
>>
>> Applied
>
> Uh, wait a minute. I think this is completely wrong.

Thanks for your review.

> The buffer is
> LOCKED for this entire sequence of operations. For a checkpoint to
> "happen", it's got to write every buffer, which it will not be able to
> do for so long as the buffer is locked.

I was following the logic we use for WAL: mark the buffer dirty, then write WAL.

The important thing is for us to signal that the buffer should be
written, which we do by marking it dirty. The actual writing of the
buffer comes later, often minutes later.

> The effect of the change to lazy_scan_heap is to force the buffer to
> be written even if we're only updating the visibility map page.
> That's a bad idea and should be reverted.

OK, agreed. Will wait for this discussion to end before acting though,
so I'm sure exactly what you mean.

> The change to
> lazy_vacuum_page is harmless, but the comment is wrong.

In what way, wrong? What should it say?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2013-04-30 13:00:46 Re: The missing pg_get_*def functions
Previous Message Stephen Frost 2013-04-30 12:51:23 Re: Remaining beta blockers