From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Jeff Davis <pgsql(at)j-davis(dot)com> |
Subject: | Re: WAL record CRC calculated incorrectly because of underlying buffer modification |
Date: | 2024-05-12 23:15:03 |
Message-ID: | CA+hUKG+=cb86CYa4W42z4wFBMwjQE2=O9RFC+i4QZuCB+d2p0A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, May 11, 2024 at 5:00 PM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> 11.05.2024 07:25, Thomas Munro wrote:
> > On Sat, May 11, 2024 at 4:00 PM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> >> 11.05.2024 06:26, Thomas Munro wrote:
> >>> Perhaps a no-image, no-change registered buffer should not be
> >>> including an image, even for XLR_CHECK_CONSISTENCY? It's actually
> >>> useless for consistency checking too I guess, this issue aside,
> >>> because it doesn't change anything so there is nothing to check.
> >> Yes, I think something wrong is here. I've reduced the reproducer to:
> > Does it reproduce if you do this?
> >
> > - include_image = needs_backup || (info &
> > XLR_CHECK_CONSISTENCY) != 0;
> > + include_image = needs_backup ||
> > + ((info & XLR_CHECK_CONSISTENCY) != 0 &&
> > + (regbuf->flags & REGBUF_NO_CHANGE) == 0);
>
> No, it doesn't (at least with the latter, more targeted reproducer).
OK so that seems like a candidate fix, but ...
> > Unfortunately the back branches don't have that new flag from 00d7fb5e
> > so, even if this is the right direction (not sure, I don't understand
> > this clean registered buffer trick) then ... but wait, why are there
> > are no failures like this in the back branches (yet at least)? Does
> > your reproducer work for 16? I wonder if something relevant changed
> > recently, like f56a9def. CC'ing Michael and Amit K for info.
>
> Maybe it's hard to hit (autovacuum needs to process the index page in a
> narrow time frame), but locally I could reproduce the issue even on
> ac27c74de(~1 too) from 2018-09-06 (I tried several last commits touching
> hash indexes, didn't dig deeper).
... we'd need to figure out how to fix this in the back-branches too.
One idea would be to back-patch REGBUF_NO_CHANGE, and another might be
to deduce that case from other variables. Let me CC a couple more
people from this thread, which most recently hacked on this stuff, to
see if they have insights:
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2024-05-12 23:29:23 | Re: race condition in pg_class |
Previous Message | Michael Paquier | 2024-05-12 23:02:02 | Re: Weird test mixup |