Quick Links

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Date:	2021-06-10 16:57:08
Message-ID:	CAEze2WiU-crhst9Xtk=6sk8rBspD0LGE1N=cafVg091Twu4FQw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, 10 Jun 2021 at 18:03, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Thu, Jun 10, 2021 at 8:49 AM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> > Could you elaborate on what this "matches what we expect" entails?
> >
> > Apart from this, I'm also quite certain that the goto-branch that
> > created this infinite loop should have been dead code: In a correctly
> > working system, the GlobalVis*Rels should always be at least as strict
> > as the vacrel->OldestXmin, but at the same time only GlobalVis*Rels
> > can be updated (i.e. move their horizon forward) during the vacuum. As
> > such, heap_prune_satisfies_vacuum should never fail to vacuum a tuple
> > that also satisifies the condition of HeapTupleSatisfiesVacuum.
>
> It's true that these two similar functions should be in perfect
> agreement in general (given the same OldestXmin). That in itself
> doesn't mean that they must always agree about a tuple in practice,
> when they're called in turn inside lazy_scan_prune(). In particular,
> nothing stops a transaction that was in progress to
> heap_prune_satisfies_vacuum (when it saw some tuples it inserted)
> concurrently aborting. That will render the same tuples fully DEAD
> inside HeapTupleSatisfiesVacuum(). So we need to restart using the
> goto purely to cover that case. See the commit message of commit
> 8523492d4e3.

I totally overlooked that HeapTupleSatisfiesVacuumHorizon does the
heavyweight XID validation and does return HEAPTUPLE_DEAD in those
recently rolled back cases. Thank you for reminding me.

> By "matches what we expect", I meant "involves a just-aborted
> transaction". We could defensively verify that the inserting
> transaction concurrently aborted at the point of retrying/calling
> heap_page_prune() a second time. If there is no aborted transaction
> involved (as was the case with this bug), then we can be confident
> that something is seriously broken.

I believe there are more cases than only the rolled back case, but
checking for those cases would potentially help, yes.

With regards,

Matthias van de Meent.

In response to

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic at 2021-06-10 16:03:06 from Peter Geoghegan

Responses

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic at 2021-06-10 17:07:36 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2021-06-10 17:07:36	Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Previous Message	Ranier Vilela	2021-06-10 16:54:55	Re: AWS forcing PG upgrade from v9.6 a disaster