Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Noah Misch <noah(at)leadboat(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Date: 2024-06-24 14:53:28
Message-ID: CAAKRu_Yh1aawR0RuRnzczGJfkUXE1oVV-+qsjL66srziV4vc-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 24, 2024 at 4:27 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 21/06/2024 03:02, Peter Geoghegan wrote:
> > On Thu, Jun 20, 2024 at 7:42 PM Melanie Plageman
> > <melanieplageman(at)gmail(dot)com> wrote:
> >
> >> The repro forces a round of index vacuuming after the standby
> >> reconnects and before pruning a dead tuple whose xmax is older than
> >> OldestXmin.
> >>
> >> At the end of the round of index vacuuming, _bt_pendingfsm_finalize()
> >> calls GetOldestNonRemovableTransactionId(), thereby updating the
> >> backend's GlobalVisState and moving maybe_needed backwards.
> >
> > Right. I saw details exactly consistent with this when I used GDB
> > against a production instance.
> >
> > I'm glad that you were able to come up with a repro that involves
> > exactly the same basic elements, including index page deletion.
>
> Would it be possible to make it robust so that we could always run it
> with "make check"? This seems like an important corner case to
> regression test.

I'd have to look into how to ensure I can stabilize some of the parts
that seem prone to flaking. I can probably stabilize the vacuum bit
with a query of pg_stat_activity making sure it is waiting to acquire
the cleanup lock.

I don't, however, see a good way around the large amount of data
required to trigger more than one round of index vacuuming. I could
generate the data more efficiently than I am doing here
(generate_series() in the from clause). Perhaps with a copy? I know it
is too slow now to go in an ongoing test, but I don't have an
intuition around how fast it would have to be to be acceptable. Is
there a set of additional tests that are slower that we don't always
run? I didn't follow how the wraparound test ended up, but that seems
like one that would have been slow.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-06-24 15:03:09 Re: Injection point locking
Previous Message Melanie Plageman 2024-06-24 14:37:08 Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin