From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | subscriptionCheck failures on nightjar |
Date: | 2019-02-11 06:31:23 |
Message-ID: | 17827.1549866683@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
nightjar just did this:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=nightjar&dt=2019-02-11%2004%3A33%3A07
The critical bit seems to be that the publisher side of the
010_truncate.pl test failed like so:
2019-02-10 23:55:58.765 EST [40771] sub3 LOG: statement: BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ
2019-02-10 23:55:58.765 EST [40771] sub3 LOG: received replication command: CREATE_REPLICATION_SLOT "sub3_16414_sync_16394" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
2019-02-10 23:55:58.798 EST [40728] sub1 PANIC: could not open file "pg_logical/snapshots/0-160B578.snap": No such file or directory
2019-02-10 23:55:58.800 EST [40771] sub3 LOG: logical decoding found consistent point at 0/160B578
2019-02-10 23:55:58.800 EST [40771] sub3 DETAIL: There are no running transactions.
I'm not sure what to make of that, but I notice that nightjar has
failed subscriptionCheck seven times since mid-December, and every one
of those shows this same PANIC. Meanwhile, no other buildfarm member
has produced such a failure. It smells like a race condition with
a rather tight window, but that's just a guess.
So: (1) what's causing the failure? (2) could we respond with
something less than take-down-the-whole-database when a failure
happens in this area?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2019-02-11 07:20:00 | Re: pg_dump multi VALUES INSERT |
Previous Message | Noah Misch | 2019-02-11 05:13:49 | Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding |