Re: pgsql: Attempt to fix unstable regression tests, take 2

From: David Rowley <dgrowley(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <drowley(at)postgresql(dot)org>, pgsql-committers(at)lists(dot)postgresql(dot)org, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: pgsql: Attempt to fix unstable regression tests, take 2
Date: 2020-03-31 23:44:09
Message-ID: CAHoyFK9pHKPHyEp35QXo9NzkFOeupyRNONuEFgej4U54=Cmj2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

On Tue, 31 Mar 2020 at 15:55, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I've been trying to reproduce this by dint of running just the stats_ext
> script, over and over in a loop. I've not had any success on fast
> machines, but on a slow one (florican's host) I got this after a few
> hundred iterations:

I've had a 13 year old laptop running just stats_ext in a loop for
about an hour now. I managed to get 1000 runs without any failure.
Trying again with autovacuum_naptime set to 1s... 1000 runs, and
nothing yet.

If you disable autovacuum on the problem table, can you still
reproduce the failure on that machine?

> Now this *IS* autovacuum interference, but it's hardly autovacuum's fault:
> the test script is supposing that autovac won't come in before it does a
> manual analyze, and that's just unsafe on its face.

Why would that matter? The manual operation will just overwrite what
autovacuum did. Obviously, there can't be any overlap due to the
ShareUpdateExclusiveLock.

My suspicion was that autovacuum ran a vacuum *after* the VACUUM
(ANALYZE). I've not studied the code, but I've had thoughts that the
manual operation might have slotted in just between when autovacuum
checked what work there was to do and when it actually did the work.
Unsure how likely that is given that we have table_recheck_autovac().

> I'm thinking that what we ought to do is have this test disable autovac
> altogether on its tables, ie
> CREATE TABLE ... WITH (autovacuum_enabled = off);
>
> However, I remain suspicious that there's something else going on,
> unrelated to autovac. All the buildfarm cases so far have been
> small underestimates, one or two rows, so they look entirely different
> from the example above. Even if autovacuum is firing unexpectedly,
> how would it cause such results?

Perhaps we can remain suspicious if we still see failures after fixing
it to disable autovacuum on these tables. It seems to happen often
enough that if we don't see it again in a week, then we might be able
to assume that was the issue.

David

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2020-03-31 23:59:57 Re: pgsql: Attempt to fix unstable regression tests, take 2
Previous Message Bruce Momjian 2020-03-31 22:44:38 pgsql: doc: remove mention of bitwise operators as solely type-limited