From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: Something is rotten in the state of Denmark... |
Date: | 2015-04-01 23:05:46 |
Message-ID: | 20021.1427929546@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I wrote:
> Observe these recent buildfarm failures:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mule&dt=2015-03-21%2000%3A30%3A02
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=guaibasaurus&dt=2015-03-23%2004%3A17%3A01
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mule&dt=2015-03-31%2023%3A30%3A02
> Three similar-looking failures, on two different machines, in a regression
> test that has existed for less than three weeks. Something is very wrong.
I've been able to reproduce this. The triggering event seems to be that
the "VACUUM FULL pg_am" in vacuum.sql has to happen while another backend
is starting up. With a ten-second delay inserted at the bottom of
PerformAuthentication(), it's trivial to hit it manually. The reason we'd
not seen this before the rolenames.sql test was added is that none of the
other tests that run concurrently with vacuum.sql perform mid-test
reconnections, or ever have AFAIR. So as long as they all managed to
start up before vacuum.sql got to the dangerous step, no problem.
I've not fully tracked it down, but I think that the blame falls on the
MVCC-snapshots-for-catalog-scans patch; it appears that it's trying to
read pg_am's pg_class entry with a snapshot that's too old, possibly
because it assumes that sinval signaling is alive which I think ain't so.
For even more fun, try "VACUUM FULL pg_class" instead:
psql: PANIC: could not open critical system index 2662
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2015-04-01 23:13:02 | Re: POLA violation with \c service= |
Previous Message | Bruce Momjian | 2015-04-01 22:26:00 | Re: pg_upgrade needs postmaster [sic] |