Re: tuplesort test coverage

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tuplesort test coverage
Date: 2019-12-12 14:27:04
Message-ID: 3244.1576160824@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> I pushed this now. We'll see what the slower buildfarm animals say. I'll
> try to see how long they took in a few days.

friarbird (a CLOBBER_CACHE_ALWAYS animal) just showed a failure in this:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=friarbird&dt=2019-12-12%2006%3A20%3A02

================== pgsql.build/src/test/regress/regression.diffs ===================
diff -U3 /pgbuild/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/tuplesort.out /pgbuild/root/HEAD/pgsql.build/src/test/regress/results/tuplesort.out
--- /pgbuild/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/tuplesort.out 2019-11-13 19:54:11.000000000 -0500
+++ /pgbuild/root/HEAD/pgsql.build/src/test/regress/results/tuplesort.out 2019-12-12 08:25:23.000000000 -0500
@@ -625,13 +625,13 @@
Group Key: a.col12
Filter: (count(*) > 1)
-> Merge Join
- Merge Cond: (a.col12 = b.col12)
- -> Sort
- Sort Key: a.col12 DESC
- -> Seq Scan on test_mark_restore a
+ Merge Cond: (b.col12 = a.col12)
-> Sort
Sort Key: b.col12 DESC
-> Seq Scan on test_mark_restore b
+ -> Sort
+ Sort Key: a.col12 DESC
+ -> Seq Scan on test_mark_restore a
(14 rows)

:qry;

Since a and b are exactly the same table, in principle it's a matter of
chance which one the planner will put on the outside of the join.
I think what happened here is that the test ran long enough for
autovacuum/autoanalyze to come along and scan the table, changing its
stats in between where the planner picked up the stats for a and those
for b, and we ended up making the opposite join order choice.

I considered fixing this by adding some restriction clause on b so
that the join order choice isn't such a coin flip. But it's not
clear that the problem couldn't recur anyway --- the table stats
would change significantly on auto-analyze, since the test script
isn't bothering to create any stats itself.

What seems like a simpler and more reliable fix is to make
test_mark_restore a temp table, thus keeping autovac away from it.
Is there a reason in terms of the test's goals not to do that?

Also ... why in the world does the script drop its tables at the end
with IF EXISTS? They'd better exist at that point. I object
to the DROP IF EXISTS up at the top, too. The regression tests
do not need to be designed to deal with an unpredictable start state,
and coding them to do so can have no effect other than possibly
masking problems.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Li Japin 2019-12-12 14:34:20 Duplicate function call on timestamp2tm
Previous Message Jeremy Finzel 2019-12-12 13:52:14 Re: Index corruption / planner issue with one table in my pg 11.6 instance