From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: XLogReadRecord() error in XlogReadTwoPhaseData() |
Date: | 2021-11-17 22:47:10 |
Message-ID: | 2782601.1637189230@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Noah Misch <noah(at)leadboat(dot)com> writes:
> Tom Lane reported another instance today:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tadarida&dt=2021-11-11%2013%3A29%3A58
> Each of the three failures happened on a sparc64 Debian+gcc machine. I had
> tried ~8000 iterations on thorntail, another sparc64 Debian+gcc animal,
> without reproducing this.
>>> As a first step, let's report the actual XLogReadRecord() error message.
>>> Attached.
>> Good catch! This looks good.
> Pushed.
Well, we didn't have to wait too long [1]:
# at t/003_cic_2pc.pl line 143.
# 'pgbench: error: client 0 script 1 aborted in command 4 query 0: ERROR: could not read two-phase state from WAL at 0/159EF88: unexpected pageaddr 0/0 in log segment 000000010000000000000001, offset 5890048
# pgbench: error: client 2 script 3 aborted in command 2 query 0: ERROR: canceling statement due to lock timeout
# pgbench: fatal: Run was aborted; the above results are incomplete.
I suppose "unexpected pageaddr 0/0" is most easily explained by supposing
that XlogReadTwoPhaseData tried to read a WAL page that hadn't been
written out yet. Have we got any synchronization around that?
regards, tom lane
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tadarida&dt=2021-11-17%2013%3A01%3A24
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2021-11-17 22:55:53 | Re: Windows: Wrong error message at connection termination |
Previous Message | Tomas Vondra | 2021-11-17 22:28:43 | Re: Patch: Range Merge Join |