Re: tackling full page writes

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tackling full page writes
Date: 2011-05-26 02:09:45
Message-ID: BANLkTimmz+SHdtG5104uWxtd9Zg+a4v4cw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 25, 2011 at 9:34 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, May 24, 2011 at 10:52 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>> On Tue, 2011-05-24 at 16:34 -0400, Robert Haas wrote:
>>> As I think about it a bit more, we'd
>>> need to XLOG not only the parts of the page we actually modifying, but
>>> any that the WAL record would need to be correct on replay.
>>
>> I don't understand that statement. Can you clarify?
>
> I'll try.  Suppose we have two WAL records A and B, with no
> intervening checkpoint, that both modify the same page.  A reads chunk
> 1 of that page and then modifies chunk 2.  B modifies chunk 1.  Now,
> suppose we make A do a "partial page write" on chunk 2 only, and B do
> the same for chunk 1.  At the point the system crashes, A and B are
> both on disk, and the page has already been written to disk as well.
> Replay begins from a checkpoint preceding A.
>
> Now, when we get to the record for A, what are we to do?  If it were a
> full page image, we could just restore it, and everything would be
> fine after that.  But if we replay the partial page write, we've got
> trouble.  A will now see the state of the chunk 1 as it existed after
> the action protected by B occurred, and will presumably do the wrong
> thing.

If this is really true, full page writes would also cause the similar problem.
No? Imagine the case where A reads page 1, then modifies page 2, and B
modifies page 1. At the recovery, A will see the state of page 1 as it existed
after the action by B.

The replay of the WAL record for A doesn't rely on the content of chunk 1
which B modified. So I don't think that "partial page writes" has such
a problem.
No?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2011-05-26 02:40:22 Re: Hash Anti Join performance degradation
Previous Message Robert Haas 2011-05-26 01:37:59 Re: New/Revised TODO? Gathering actual read performance data for use by planner