Re: BUG #17928: Standby fails to decode WAL on termination of primary

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Subject: Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date: 2023-09-23 04:44:18
Message-ID: ZQ5tIoarrcwAlbVZ@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sat, Sep 23, 2023 at 02:02:02PM +1200, Thomas Munro wrote:
> Hmm, copperhead (riscv) showed an unusual failure, a segfault in
> suspiciously nearby code. I don't immediately know what that's about,
> let's see if we get more clues...

Yep, something is going on with the prefetching code:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=copperhead&dt=2023-09-22%2023%3A16%3A33

Using host libthread_db library
"/lib/riscv64-linux-gnu/libthread_db.so.1". Core was generated by
`postgres: paris: startup recovering 000000030000000000000003
'. Program terminated with signal SIGSEGV, Segmentation fault.
#0 pg_comp_crc32c_sb8 (crc=1613114916, crc(at)entry=4294967295,
data=data(at)entry=0x2af9e00d48, len=<optimized out>) at
pg_crc32c_sb8.c:56 56 uint32 a = *p4++ ^ crc;
#0 pg_comp_crc32c_sb8 (crc=1613114916, crc(at)entry=4294967295,
data=data(at)entry=0x2af9e00d48, len=<optimized out>) at
pg_crc32c_sb8.c:56
#1 0x0000002ad59a1536 in ValidXLogRecord (state=0x2af9db1fc0,
record=0x2af9e00d30, recptr=50520048) at xlogreader.c:1195
#2 0x0000002ad59a285a in XLogDecodeNextRecord
(state=state(at)entry=0x2af9db1fc0, nonblocking=<optimized out>) at
xlogreader.c:842
#3 0x0000002ad59a28c0 in XLogReadAhead (state=0x2af9db1fc0,
nonblocking=nonblocking(at)entry=false) at xlogreader.c:969
#4 0x0000002ad59a0996 in XLogPrefetcherNextBlock
(pgsr_private=184580836680, lsn=0x2af9e14618) at xlogprefetcher.c:496
#5 0x0000002ad59a11c8 in lrq_prefetch (lrq=<optimized out>) at
xlogprefetcher.c:256
#6 lrq_complete_lsn (lsn=<optimized out>, lrq=0x2af9e145c8) at
xlogprefetcher.c:294
#7 XLogPrefetcherReadRecord
(prefetcher=prefetcher(at)entry=0x2af9e00d48,
errmsg=errmsg(at)entry=0x3fec9a2bf0) at xlogprefetcher.c:1041

The stack may point out at a different issue, but perhaps this is a
matter where we're returning now XLREAD_SUCCESS where previously we
had XLREAD_FAIL, causing this code to fail thinking that the block was
valid while it's not?
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2023-09-23 16:39:04 Re: BUG #18131: PL/pgSQL: regclass procedure parameter wrongly memoized(?)
Previous Message Thomas Munro 2023-09-23 02:29:40 Re: BUG #17928: Standby fails to decode WAL on termination of primary