From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Bruce Momjian <bruce(at)momjian(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com> |
Subject: | Re: Improving connection scalability: GetSnapshotData() |
Date: | 2020-08-15 16:42:00 |
Message-ID: | 20200815164200.drok4tt5ju7ijivf@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2020-08-15 11:10:51 -0400, Tom Lane wrote:
> We have two essentially identical buildfarm failures since these patches
> went in:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=damselfly&dt=2020-08-15%2011%3A27%3A32
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&dt=2020-08-15%2003%3A09%3A14
>
> They're both in the same place in the freeze-the-dead isolation test:
> TRAP: FailedAssertion("!TransactionIdPrecedes(members[i].xid, cutoff_xid)", File: "heapam.c", Line: 6051)
> 0x9613eb <ExceptionalCondition+0x5b> at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
> 0x52d586 <heap_prepare_freeze_tuple+0x926> at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
> 0x53bc7e <heap_vacuum_rel+0x100e> at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
> 0x6949bb <vacuum_rel+0x25b> at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
> 0x694532 <vacuum+0x602> at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
> 0x693d1c <ExecVacuum+0x37c> at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
> 0x8324b3
> ...
> 2020-08-14 22:16:41.783 CDT [78410:4] LOG: server process (PID 80395) was terminated by signal 6: Abort trap
> 2020-08-14 22:16:41.783 CDT [78410:5] DETAIL: Failed process was running: VACUUM FREEZE tab_freeze;
>
> peripatus has successes since this failure, so it's not fully reproducible
> on that machine. I'm suspicious of a timing problem in computing vacuum's
> cutoff_xid.
Hm, maybe it's something around what I observed in
https://www.postgresql.org/message-id/20200723181018.neey2jd3u7rfrfrn%40alap3.anarazel.de
I.e. that somehow we end up with hot pruning and freezing coming to a
different determination, and trying to freeze a hot tuple.
I'll try to add a few additional asserts here, and burn some cpu tests
trying to trigger the issue.
I gotta escape the heat in the house for a few hours though (no AC
here), so I'll not look at the results till later this afternoon, unless
it triggers soon.
> (I'm also wondering why the failing check is an Assert rather than a real
> test-and-elog. Assert doesn't seem like an appropriate way to check for
> plausible data corruption cases.)
Robert, and to a lesser degree you, gave me quite a bit of grief over
converting nearby asserts to elogs. I agree it'd be better if it were
an assert, but ...
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2020-08-15 16:59:13 | Re: run pgindent on a regular basis / scripted manner |
Previous Message | Tom Lane | 2020-08-15 15:10:51 | Re: Improving connection scalability: GetSnapshotData() |