Re: Parallel Append subplan order instability on aye-aye

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Append subplan order instability on aye-aye
Date: 2019-05-21 03:15:47
Message-ID: 28041.1558408547@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> On Mon, May 20, 2019 at 4:46 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Note that in the discussion that led up to 624e440a, we never did
>> think that we'd completely explained the original irreproducible
>> failure.
>>
>> I think I've seen a couple of other cases of this same failure
>> in the buildfarm recently, but too tired to go looking right now.

> I think it might be dependent on incidental vacuum/analyze activity
> having updated reltuples.

I got around to excavating in the buildfarm archives, and found a round
dozen of more-or-less-similar incidents. I went back 18 months, which
by coincidence (i.e., I didn't realize it till just now) is just about
the time since 624e440a:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=francolin&dt=2018-01-14%2006%3A30%3A02
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2018-03-02%2011%3A30%3A19
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-03-11%2023%3A25%3A46
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-03-15%2000%3A02%3A04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spurfowl&dt=2018-04-05%2003%3A22%3A05
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2018-04-07%2018%3A32%3A02
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite&dt=2018-04-08%2019%3A55%3A06
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=damselfly&dt=2018-04-23%2010%3A00%3A15
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2019-04-19%2001%3A50%3A08
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2019-04-23%2021%3A23%3A12
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2019-05-14%2014%3A59%3A43
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=aye-aye&dt=2019-05-19%2018%3A30%3A10

There are two really interesting things about this list:

* All the failures are on HEAD. This implies that the issue was
not there when we forked off v11, else we'd surely have seen an
instance on that branch by now. The dates above are consistent
with the idea that we eliminated the problem in roughly May 2018,
and then it came back about a month ago. (Of course, maybe this
just traces to unrelated changes in test timing.)

* All the failures are in the pg_upgrade test (and some are before,
some after, we switched that from serial to parallel schedule).
This makes very little sense; how is that meaningfully different
from the buildfarm's straight-up invocations of "make check" and
"make installcheck"?

Note that I excluded a bunch of cases where we managed to run
select_parallel despite having suffered failures earlier in the
test run, typically failures that caused the sanity_check test
to not run. These led to diffs in the X_star queries that look
roughly similar to these, but not the same.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-05-21 04:04:48 Re: clean up docs for v12
Previous Message Amit Langote 2019-05-21 03:00:36 Re: behaviour change - default_tablesapce + partition table