From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Subject: | Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()? |
Date: | 2019-01-29 20:23:51 |
Message-ID: | 20190129202351.vqe45bjfbq3osgp6@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2019-01-29 11:25:41 -0800, Andres Freund wrote:
> On 2019-01-28 22:37:53 -0500, Tom Lane wrote:
> > Andres Freund <andres(at)anarazel(dot)de> writes:
> > > I did that now. I couldn't reproduce it locally, despite a lot of
> > > runs. Looking at the buildfarm it looks like the failures were,
> > > excluding handfish which failed without recognizable symptoms before and
> > > after, on BSD derived platforms (netbsd, freebsd, OX), which certainly
> > > is interesting.
> >
> > Isn't it now. Something about the BSD scheduler perhaps? But we've
> > got four or five different BSD-ish platforms that reported failures,
> > and it's hard to believe they've all got identical schedulers.
> >
> > That second handfish failure does match the symptoms elsewhere:
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish&dt=2019-01-29%2000%3A20%3A22
> >
> > --- /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/expected/thread-thread.stderr 2018-10-30 20:11:45.551967381 -0300
> > +++ /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/results/thread-thread.stderr 2019-01-28 22:38:20.614211568 -0200
> > @@ -0,0 +1,20 @@
> > +SQL error: page 0 of relation "test_thread" should be empty but is not on line 125
> >
> > so it's not quite 100% BSD, but certainly the failure rate on BSD is
> > way higher than elsewhere. Puzzling.
>
> Interesting.
>
> While chatting with Robert about this issue I came across the following
> section of code:
>
> /*
> * If the FSM knows nothing of the rel, try the last page before we
> * give up and extend. This avoids one-tuple-per-page syndrome during
> * bootstrapping or in a recently-started system.
> */
> if (targetBlock == InvalidBlockNumber)
> {
> BlockNumber nblocks = RelationGetNumberOfBlocks(relation);
>
> if (nblocks > 0)
> targetBlock = nblocks - 1;
> }
>
>
> I think that explains the issue (albeit not why it is much more frequent
> on BSDs). Because we're not going through the FSM, it's perfectly
> possible to find a page that is uninitialized, *and* is not yet in the
> FSM. The only reason this wasn't previously actively broken, I think, is
> that while we previously *also* looked that page (before the extending
> backend acquired a lock!), when looking at the page
> PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free
> space because it just interprets the zeroes in pd_upper - pd_lower as no
> free space.
>
> Hm, thinking about what a good solution here could be.
I wonder if we should just expand the logic we have for
RBM_ZERO_AND_LOCK so it can be and use it in hio.c (we probably could
just use it without any changes, but the name seems a bit confusing) -
because that'd prevent the current weirdness that it's possible that the
buffer can be locked by somebody between the ReadBufferBI(P_NEW) and and
the LockBuffer(BUFFER_LOCK_EXCLUSIVE). I think that'd allow us to
alltogether drop the cleanup lock logic we currently have, and also
protect us against the FSM issue I'd outlined upthread?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2019-01-29 20:36:12 | Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()? |
Previous Message | Jesper Pedersen | 2019-01-29 20:19:28 | Re: pg_upgrade: Pass -j down to vacuumdb |