From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic |
Date: | 2021-06-10 16:57:08 |
Message-ID: | CAEze2WiU-crhst9Xtk=6sk8rBspD0LGE1N=cafVg091Twu4FQw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 10 Jun 2021 at 18:03, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Thu, Jun 10, 2021 at 8:49 AM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> > Could you elaborate on what this "matches what we expect" entails?
> >
> > Apart from this, I'm also quite certain that the goto-branch that
> > created this infinite loop should have been dead code: In a correctly
> > working system, the GlobalVis*Rels should always be at least as strict
> > as the vacrel->OldestXmin, but at the same time only GlobalVis*Rels
> > can be updated (i.e. move their horizon forward) during the vacuum. As
> > such, heap_prune_satisfies_vacuum should never fail to vacuum a tuple
> > that also satisifies the condition of HeapTupleSatisfiesVacuum.
>
> It's true that these two similar functions should be in perfect
> agreement in general (given the same OldestXmin). That in itself
> doesn't mean that they must always agree about a tuple in practice,
> when they're called in turn inside lazy_scan_prune(). In particular,
> nothing stops a transaction that was in progress to
> heap_prune_satisfies_vacuum (when it saw some tuples it inserted)
> concurrently aborting. That will render the same tuples fully DEAD
> inside HeapTupleSatisfiesVacuum(). So we need to restart using the
> goto purely to cover that case. See the commit message of commit
> 8523492d4e3.
I totally overlooked that HeapTupleSatisfiesVacuumHorizon does the
heavyweight XID validation and does return HEAPTUPLE_DEAD in those
recently rolled back cases. Thank you for reminding me.
> By "matches what we expect", I meant "involves a just-aborted
> transaction". We could defensively verify that the inserting
> transaction concurrently aborted at the point of retrying/calling
> heap_page_prune() a second time. If there is no aborted transaction
> involved (as was the case with this bug), then we can be confident
> that something is seriously broken.
I believe there are more cases than only the rolled back case, but
checking for those cases would potentially help, yes.
With regards,
Matthias van de Meent.
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2021-06-10 17:07:36 | Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic |
Previous Message | Ranier Vilela | 2021-06-10 16:54:55 | Re: AWS forcing PG upgrade from v9.6 a disaster |