Re: WAL replay bugs

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: WAL replay bugs
Date: 2014-04-23 12:43:46
Message-ID: 5357B582.7060707@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/17/2014 07:59 PM, Heikki Linnakangas wrote:
> On 04/08/2014 06:41 AM, Michael Paquier wrote:
>> On Tue, Apr 8, 2014 at 3:16 AM, Heikki Linnakangas
>> <hlinnakangas(at)vmware(dot)com> wrote:
>>>
>>> I've been playing with a little hack that records a before and after image
>>> of every page modification that is WAL-logged, and writes the images to a
>>> file along with the LSN of the corresponding WAL record. I set up a
>>> master-standby replication with that hack in place in both servers, and ran
>>> the regression suite. Then I compared the after images after every WAL
>>> record, as written on master, and as replayed by the standby.
>> Assuming that adding some dedicated hooks in the core able to do
>> actions before and after a page modification occur is not *that*
>> costly (well I imagine that it is not acceptable in terms of
>> performance), could it be possible to get that in the shape of a
>> extension that could be used to test WAL record consistency? This may
>> be an idea to think about...
>
> Yeah, working on it. It can live as a patch set if nothing else.
>
> This has been very fruitful, I just committed another fix for a bug I
> found with this earlier today.
>
> There are quite a few things that cause differences between master and
> standby. We have hint bits in many places, unused space that isn't
> zeroed etc.

[a few more fixed bugs later]

Ok, I'm now getting clean output when running the regression suite with
this tool.

And here is the tool itself. It consists of two parts:

1. Modifications to the backend to write the page images
2. A post-processing tool to compare the logged images between master
and standby.

The attached diff contains both parts. The postprocessing tool is in
contrib/page_image_logging. See contrib/page_image_logging/README for
instructions. Let me know if you have any questions or need further help
running the tool.

I've also pushed this to my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git, branch
"page_image_logging". I intend to keep it up-to-date with current master.

This is a pretty ugly hack, so I'm not proposing to commit this in the
current state. But perhaps this could be done more cleanly, by adding
some hooks in the backend as Michael suggested.
- Heikki

Attachment Content-Type Size
page_image_logging-1.patch text/x-diff 24.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-04-23 12:46:35 Re: 9.4 Proposal: Initdb creates a single table
Previous Message Stephen Frost 2014-04-23 12:28:22 Re: 9.4 Proposal: Initdb creates a single table