From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Hot standby queries see transient all-zeros pages |
Date: | 2024-12-14 00:41:40 |
Message-ID: | CA+hUKGLVoihg6T5XCDBeUYsTHLO_+id=c9HarwRGRCQB=76KqA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Dec 14, 2024 at 11:41 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2024-05-12 10:16:58 -0700, Noah Misch wrote:
> > The use of RBM_ZERO_AND_LOCK is incompatible with that. See a similar
> > argument at https://postgr.es/m/flat/5101(dot)1328219790(at)sss(dot)pgh(dot)pa(dot)us that led
> > me to the cause. Adding a 10ms sleep just after RBM_ZERO_AND_LOCK, I got 2
> > failures in 7 runs of 027_stream_regress.pl, at Assert(ItemIdIsNormal(lpp))
> > in heapgettup_pagemode(). In the core file, lpp pointed into an all-zeros
> > page. RestoreBkpBlocks() had been doing RBM_ZERO years before hot standby
> > existed, but it wasn't a bug until queries could run concurrently.
>
> Yep, this does sound like a problem.
This was a bug of mine that was fixed later in another thread, which
Noah might have mistaken for intended-but-wrong behaviour, because the
name is weird IMHO.
https://www.postgresql.org/message-id/flat/7ed10231-ce47-03d5-d3f9-4aea0dc7d5a4%40gmail.com
> Hm. Leaving RBM_ZERO_AND_LOCK aside, is it actually always safe to do
> RestoreBlockImage() into a buffer that currently is pinned? Not sure if
> there's actually all that much guarantee what transient state one can read
> when reading a page concurrently to a memcpy(). I suspect it's practically
> rare to see a problem, but one could imagine an memcpy implementation that
> uses non-temporal writes, which afaict would leave you open to seeing quite
> random states when reading concurrently, as the cache coherence protocol
> doesn't protect anymore.
I also found that strange, while working on commit e656657f. I
observed and pondered the behaviour you mentioned while testing that:
you're scanning a page holding only a pin, which on a primary would
allow others to add more (invisible to you) tuples that you shouldn't
attempt to access (based on visibility checks performed while you held
the lock), and in recovery the exact same thing is possible, except
that if it happens to include a FPI then it *will* overwrite the
tuples that you are scanning... with the exact same ones-and-zeros.
But I couldn't think of anything specifically wrong with it, ie that
might break. Well, maybe it's not exactly the same ones-and-zeros in
some hint-bit scenario, but if that is considered acceptable on a
primary too (well maybe you're getting rid of it), but what's the
difference? Oh... that's a plain store and you're thinking that
memcpy() might be allowed to use some weirder cache coherency modes
for speed, ... sounds likely to break a lot of stuff, is that a real
thing?!
> > I suspect the fix is to add a ReadBufferMode specified as, "If the block is
> > already in shared_buffers, do RBM_NORMAL and exclusive-lock the buffer.
> > Otherwise, do RBM_ZERO_AND_LOCK."
>
> I think that should work. At least in the current code it looks near trivial
> to implement, although the branch differences are going to be annoying.
> As usual the hardest part would probably be the naming. Maybe
> RBM_ZERO_ON_MISS_LOCK? RBM_LOCK_ZERO_ON_MISS? RBM_DWIM?
That's already how RBM_ZERO_AND_LOCK works, and you are the third
victim of its confusing name counting me (that's how I broke it in the
vectored stuff) and Noah IIUC what happened here.
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2024-12-14 00:45:33 | Re: checkpointer: PANIC: could not fsync file: No such file or directory |
Previous Message | Noah Misch | 2024-12-14 00:38:05 | Re: Hot standby queries see transient all-zeros pages |