Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae

From: Bowen Shi <zxwsbg12138(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Date: 2024-05-14 03:42:01
Message-ID: CAM_vCuc=S=djaSyVUQDDJ7fmxQqVr4dB-iv2mdNMSWA=GRb6Dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, May 13, 2024 at 10:42 PM Melanie Plageman <melanieplageman(at)gmail(dot)com>
wrote:

> On Sun, May 12, 2024 at 11:19 PM Bowen Shi <zxwsbg12138(at)gmail(dot)com> wrote:
> >
> > Hi,
> >>
> >> Obviously we should actually fix this on back branches, but could we
> >> at least make the retry loop interruptible in some way so people could
> >> use pg_cancel/terminate_backend() on a stuck autovacuum worker or
> >> vacuum process?
> >
> >
> > If the problem happens in versions <= PG 16, we don't have a good
> solution (vacuum process holds the exclusive lock cause checkpoint hangs).
> >
> > Maybe we can make the retry loop interruptible first. However, since we
> are using START_CRIT_SECTION, we cannot simply use CHECK_FOR_INTERRUPTS to
> handle it.
>
> As far as I can tell, in 14 and 15, the versions where the issue
> reported here is present, there is not a critical section in the
> section of code looped through in the retry loop in lazy_scan_prune().
>

You are correct, I tried again to add CHECK_FOR_INTERRUPTS in the retry
loop, and when attempting to interrupt the current loop using the
pg_terminate_backend function, the value of InterruptHoldoffCount is 1,
which causes the vacuum to not end.

We can actually fix the particular issue I reproduced with the
> attached patch. However, I think it is still worth making the retry
> loop interruptible in case there are other ways to end up infinitely
> looping in the retry loop in lazy_scan_prune().

I attempted to apply the patch on the REL_15_STABLE branch, but encountered
some conflicts. May I ask which branch you are using?

--
Regards
Bowen Shi

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-05-14 04:12:15 Re: numeric calculation bug as of 16.2-2
Previous Message David Rowley 2024-05-14 03:30:34 Re: numeric calculation bug as of 16.2-2