Re: Intermittent buildfarm failures on wrasse

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Intermittent buildfarm failures on wrasse
Date: 2022-04-13 23:07:01
Message-ID: CAH2-WznszHSj2iZO4fO7r2NWM1TVFut-fenxnmnvYPSm2_5qhA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 13, 2022 at 3:54 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> After a bit more navel-contemplation I see a way that the pgstats
> work could have changed timing in this area. We used to have a
> rate limit on how often stats reports would be sent to the
> collector, which'd ensure half a second or so delay before a
> transaction's change counts became visible to the autovac daemon.
> I've not looked at the new code, but I'm betting that that's gone
> and the autovac launcher might start a worker nearly immediately
> after some foreground process finishes inserting some rows.
> So that could result in autovac activity occurring concurrently
> with test_setup where it didn't before.

But why should it matter? The test_setup.sql VACUUM of tenk1 should
leave relallvisible and relpages in the same state, either way (or
very close to it).

The only way that it seems like it could matter is if OldestXmin was
held back during test_setup.sql's execution of the VACUUM command.

> As to what to do about it ... maybe apply the FREEZE and
> DISABLE_PAGE_SKIPPING options in test_setup's vacuums?
> It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps
> not sufficient.

BTW, the work on VACUUM for Postgres 15 probably makes VACUUM test
flappiness issues less of a problem -- unless they're issues involving
something holding back OldestXmin when it shouldn't (in which case it
won't have any effect on test stability). I would expect that to be
the case, at least, since VACUUM now does almost all of the same work
for any individual page that it cannot get a cleanup lock on. There is
surprisingly little difference between a page that gets processed by
lazy_scan_prune and a page that gets processed by lazy_scan_noprune.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2022-04-13 23:10:16 Re: Temporary file access API
Previous Message David Rowley 2022-04-13 23:06:33 Re: Intermittent buildfarm failures on wrasse