Why the buildfarm is all pink

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Why the buildfarm is all pink
Date: 2013-12-11 00:55:12
Message-ID: 14841.1386723312@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I was surprised to see that my back-patches of the recent SubLink
unpleasantness were failing on many of the buildfarm members, but
only in the 9.1 and 9.0 branches. The difficulty appears to be
that the EXPLAIN output for the new test query changes depending on
whether or not "tenk1" has been analyzed yet. In 9.2 and up,
it reliably has been, because create_index runs first and that script
does this:

create_index.sql:901:vacuum analyze tenk1; -- ensure we get consistent plans here

But the older branches lack that. Running the tests serially
usually fails in 9.1 and 9.0, and likely would fail in 8.4 except
that that branch isn't printing the selected plan for lack of
EXPLAIN (COSTS OFF). Parallel tests sometimes succeed (and
did for me), because the subselect test runs concurrently with
"aggregates" and "join", which have

aggregates.sql:211:analyze tenk1; -- ensure we get consistent plans here
join.sql:333:analyze tenk1; -- ensure we get consistent plans here

so depending on timing, one of those might have gotten the job done,
or maybe autovacuum would show up in time to save the day.

We need a more consistent strategy for this :-(

The minimum-change strategy for getting the buildfarm green again
would be to insert another ad-hoc "analyze tenk1" into subselect.sql
in the back branches. I don't particularly want to fix it that way,
though, because it'd just be a problem waiting to happen anytime
someone back-patches a bug fix that includes EXPLAIN output.

What I think would be the best strategy, on the whole, is to put
a whole-database "ANALYZE;" at the end of the "copy" regression test,
which is the one that loads up tenk1 and the other large test tables.
It also comes after the tests that load up small static tables such
as int4_tbl. This would ensure that all the tables that we typically
use for one-off EXPLAIN tests are analyzed early in the proceedings.
Then we could get rid of the various ad-hoc analyzes that have snuck
into various tests.

While I'm on the subject ... I noticed that the recently-added
matview test has this:

matview.sql:133:VACUUM ANALYZE;

This doesn't make me happy. Aside from the sheer waste of cycles
involved in re-analyzing the entire regression database, this
test runs in parallel with half a dozen others, and it could cause
plan instability in those. Of course, if it does, then most likely
those tests have a hazard from autovacuum anyway. But this still
looks to me like a poor bit of test design.

Anyway, bottom line is that I think we need to institute, and
back-patch, some consistent scheme for when to analyze the standard
tables during the regression tests, so that we don't have hazards
like this for tests that want to check what plan gets selected.

Comments?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-12-11 00:56:21 Re: [COMMITTERS] pgsql: Add a new reloption, user_catalog_table.
Previous Message Michael Paquier 2013-12-11 00:54:36 Re: [COMMITTERS] pgsql: Add a new reloption, user_catalog_table.