From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic |
Date: | 2021-06-09 15:42:34 |
Message-ID: | CAEze2Wg32Y9+WJfw=aofkRx1ZRFt_Ev6bNPc4PSaz7PjSFtZgQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 9 Jun 2021 at 04:42, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Tue, Jun 08, 2021 at 05:47:28PM -0700, Peter Geoghegan wrote:
> > I don't have time to try this out myself today, but offhand I'm pretty
> > confident that this is sufficient to reproduce the underlying bug
> > itself. And if that's true then I guess it can't have anything to do
> > with the pg_upgrade/pg_resetwal issue Tom just referenced, despite the
> > apparent similarity.
>
> Agreed. It took me a couple of minutes to get autovacuum to run in an
> infinite loop with a standalone instance. Nice catch, Justin!
I believe that I've found the culprit:
GetOldestNonRemovableTransactionId(rel) does not use the exact same
conditions for returning OldestXmin as GlobalVisTestFor(rel) does.
This results in different minimal XIDs, and subsequently this failure.
The attached patch fixes this inconsistency, and adds a set of asserts
to ensure that GetOldesNonRemovableTransactionId is equal to the
maybe_needed of the GlobalVisTest of that relation, plus some at
GlobalVisUpdateApply such that it will fail whenever it is called with
arguments that would move the horizons in the wrong direction. Note
that there was no problem in GlobalVisUpdateApply, but it helped me
determine that that part was not the source of the problem, and I
think that having this safeguard is a net-positive.
Another approach might be changing GlobalVisTestFor(rel) instead to
reflect the conditions in GetOldestNonRemovableTransactionId.
With attached prototype patch, I was unable to reproduce the
problematic case in 10 minutes. Without, I got the problematic
behaviour in seconds.
With regards,
Matthias
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Fix-a-bug-in-GetOldestNonRemovableTransactionId.patch | text/x-patch | 2.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2021-06-09 16:05:10 | Re: Multiple hosts in connection string failed to failover in non-hot standby mode |
Previous Message | Finnerty, Jim | 2021-06-09 15:31:33 | Character expansion with ICU collations |