Re: Intermittent buildfarm failures on wrasse

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Intermittent buildfarm failures on wrasse
Date: 2022-04-13 23:13:07
Message-ID: 9F59C4BB-3C82-44B0-9B10-4A2CCC3DE552@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On April 13, 2022 7:06:33 PM EDT, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>On Thu, 14 Apr 2022 at 10:54, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> After a bit more navel-contemplation I see a way that the pgstats
>> work could have changed timing in this area. We used to have a
>> rate limit on how often stats reports would be sent to the
>> collector, which'd ensure half a second or so delay before a
>> transaction's change counts became visible to the autovac daemon.
>> I've not looked at the new code, but I'm betting that that's gone
>> and the autovac launcher might start a worker nearly immediately
>> after some foreground process finishes inserting some rows.
>> So that could result in autovac activity occurring concurrently
>> with test_setup where it didn't before.
>
>It's not quite clear to me why the manual vacuum wouldn't just cancel
>the autovacuum and complete the job. I can't quite see how there's
>room for competing page locks here. Also, see [1]. One of the
>reported failing tests there is the same as one of the failing tests
>on wrasse. My investigation for the AIO branch found that
>relallvisible was not equal to relpages. I don't recall the reason why
>that was happening now.
>
>> As to what to do about it ... maybe apply the FREEZE and
>> DISABLE_PAGE_SKIPPING options in test_setup's vacuums?
>> It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps
>> not sufficient.
>
>We should likely try and confirm it's due to relallvisible first.

We had this issue before, and not just on the aio branch. On my phone right now, so won't look up references.

IIRC the problem in matter isn't skipped pages, but that the horizon simply isn't new enough to mark pages as all visible. An independent autovac worker starting is enough for that, for example. Previously the data load and vacuum were further apart, preventing this kind of issue.

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-04-13 23:17:02 Re: Intermittent buildfarm failures on wrasse
Previous Message Stephen Frost 2022-04-13 23:10:16 Re: Temporary file access API