BUG #16922: In cascading replication, a standby server aborted when an upstream standby server promoted

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: egashira(dot)yusuke(at)fujitsu(dot)com
Subject: BUG #16922: In cascading replication, a standby server aborted when an upstream standby server promoted
Date: 2021-03-11 08:05:34
Message-ID: 16922-3d1488a219f53da4@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 16922
Logged by: yusuke egashira
Email address: egashira(dot)yusuke(at)fujitsu(dot)com
PostgreSQL version: 13.2
Operating system: Red Hat Enterprise Linux 8
Description:

Hi.

My customer uses PostgreSQL with a cascading replication.
The replication between "NODE-A" and "NODE-B" is synchronous replication,
and between "NODE-B" and "NODE-C" is asynchronous.

"NODE-A" <-[synchronous]-> "NODE-B" <-[non-synchronous]-> "NODE-C"

When the primary server "NODE-A" crashed due to full WAL storage and
"NODE-B" promoted, the downstream standby server "NODE-C" aborted with
following messages.

2021-03-11 11:26:28.470 JST [85228] LOG: invalid contrecord length 26 at
0/5FFFFF0
2021-03-11 11:26:28.470 JST [85232] FATAL: terminating walreceiver process
due to administrator command
2021-03-11 11:26:28.470 JST [85228] PANIC: could not open file
"pg_wal/000000020000000000000005": No such file or directory
2021-03-11 11:26:28.492 JST [85260] LOG: started streaming WAL from primary
at 0/5000000 on timeline 2
2021-03-11 11:26:29.260 JST [85227] LOG: startup process (PID 85228) was
terminated by signal 6: Aborted
2021-03-11 11:26:29.260 JST [85227] LOG: terminating any other active
server processes
2021-03-11 11:26:29.290 JST [85227] LOG: database system is shut down

I found this behavior in PostgreSQL 13.2, 12.6, 11.11, 10.16, 9.6.21 and
9.5.24.

The reproduce procedure in PostgreSQL 13.2 on RHEL8 is:

===========================
# Create temporary wal data device.
mkdir -p /tmp/wal
dd if=/dev/zero of=/tmp/waldata bs=100MB count=1
sudo mkfs -t xfs /tmp/waldata
sudo mount -t xfs -o loop /tmp/waldata /tmp/wal
sudo chown postgres:postgres /tmp/wal
# mkdir -p /tmp/wal/pg_wal

# Create primary NODE-A.
initdb -D /tmp/nodea -X /tmp/wal/pg_wal

cat >>/tmp/nodea/postgresql.conf <<_EOF_
port=5432
synchronous_standby_names = 'nodeb'
restart_after_crash = off
wal_keep_size = 10GB
_EOF_

pg_ctl -D /tmp/nodea start -w -l /tmp/nodea_logfile

# Craete standby NODE-B.
pg_basebackup -D /tmp/nodeb -R -X fetch -d
"postgresql:///:5432?application_name=nodeb"
echo "port = 5433" >> /tmp/nodeb/postgresql.conf
pg_ctl -D /tmp/nodeb start -w -l /tmp/nodeb_logfile

# Create standby NODE-C
pg_basebackup -D /tmp/nodec -p 5433 -R -X fetch -d
"postgresql:///:5433?application_name=nodec"
echo "port = 5434" >> /tmp/nodec/postgresql.conf
pg_ctl -D /tmp/nodec start -w

# Execute SQLs for WAL full.
createdb -p 5432
cat <<_psql | psql -p 5432
create table test (a int, b text);
insert into test values (generate_series(1,10000000), 'abcdefghijk');
_psql

# ** After down primary NODE-A **
pg_ctl -D /tmp/nodeb promote

# In this timing, the NODE-C will down.
===========================

The stack trace of coredump with debug build is following:
# This stack trace was taken in a different environment than the other
attached logs.

#0 0x00007f73e3f1f8df in raise () from /lib64/libc.so.6
#1 0x00007f73e3f09cf5 in abort () from /lib64/libc.so.6
#2 0x0000000000878c83 in errfinish (filename=<optimized out>,
filename(at)entry=0x8cc67e "xlog.c", lineno=lineno(at)entry=3741,
funcname=funcname(at)entry=0x8d45c8 <__func__.28536> "XLogFileRead") at
elog.c:592
#3 0x0000000000507730 in XLogFileRead (segno=<optimized out>, tli=2,
source=source(at)entry=XLOG_FROM_STREAM, notfoundOk=notfoundOk(at)entry=false,
emode=22) at xlog.c:3739
#4 0x0000000000509c62 in WaitForWALToBecomeAvailable (tliRecPtr=134217712,
fetching_ckpt=<optimized out>, randAccess=false, RecPtr=117448704) at
xlog.c:12411
#5 XLogPageRead (xlogreader=0x282b478, targetPagePtr=<optimized out>,
reqLen=8192, targetRecPtr=134217712, readBuf=0x2858af8 "\006\321\a") at
xlog.c:11908
#6 0x0000000000517e25 in ReadPageInternal (state=state(at)entry=0x282b478,
pageptr=pageptr(at)entry=134209536, reqLen=reqLen(at)entry=8192) at
xlogreader.c:608
#7 0x000000000051886a in XLogReadRecord (state=state(at)entry=0x282b478,
errormsg=errormsg(at)entry=0x7ffd148db3f8) at xlogreader.c:329
#8 0x000000000050797e in ReadRecord (xlogreader=xlogreader(at)entry=0x282b478,
emode=emode(at)entry=15, fetching_ckpt=fetching_ckpt(at)entry=false) at
xlog.c:4346
#9 0x000000000050e205 in ReadRecord (fetching_ckpt=false, emode=15,
xlogreader=0x282b478) at xlog.c:4340
#10 StartupXLOG () at xlog.c:7385
#11 0x00000000006f72d1 in StartupProcessMain () at startup.c:204
#12 0x000000000051cf5b in AuxiliaryProcessMain (argc=argc(at)entry=2,
argv=argv(at)entry=0x7ffd148df240) at bootstrap.c:443
#13 0x00000000006f343f in StartChildProcess (type=StartupProcess) at
postmaster.c:5492
#14 0x00000000006f6938 in PostmasterMain (argc=argc(at)entry=3,
argv=argv(at)entry=0x282acc0) at postmaster.c:1404
#15 0x000000000047ecd9 in main (argc=3, argv=0x282acc0) at main.c:210

I also attach the debug5 level logs of NODE-C.

2021-03-11 11:06:39.136 JST [84342] LOG: replication terminated by primary
server
2021-03-11 11:06:39.136 JST [84342] DETAIL: End of WAL reached on timeline
1 at 0/6000000.
2021-03-11 11:06:39.140 JST [84342] LOG: fetching timeline history file for
timeline 2 from primary server
2021-03-11 11:06:39.148 JST [84342] DEBUG: walreceiver ended streaming and
awaits new instructions
2021-03-11 11:06:39.148 JST [84338] LOG: new target timeline is 2
2021-03-11 11:06:39.148 JST [84338] DEBUG: switched WAL source from stream
to archive after failure
2021-03-11 11:06:39.148 JST [84338] DEBUG: could not open file
"pg_wal/000000020000000000000006": No such file or directory
2021-03-11 11:06:39.148 JST [84338] DEBUG: switched WAL source from archive
to stream after failure
2021-03-11 11:06:39.148 JST [84342] LOG: restarted WAL streaming at
0/6000000 on timeline 2
2021-03-11 11:06:39.148 JST [84342] DEBUG: sendtime 2021-03-11
11:06:39.148674+09 receipttime 2021-03-11 11:06:39.148723+09 replication
apply delay 17910 ms transfer latency 1 ms
2021-03-11 11:06:39.148 JST [84342] DEBUG: creating and filling new WAL
file
2021-03-11 11:06:39.240 JST [84342] DEBUG: done creating and filling new
WAL file
2021-03-11 11:06:39.240 JST [84342] DEBUG: sending write 0/6000048 flush
0/5FFFFF0 apply 0/5FFFFF0
2021-03-11 11:06:39.242 JST [84342] DEBUG: sending write 0/6000048 flush
0/6000048 apply 0/5FFFFF0
2021-03-11 11:06:39.486 JST [84342] DEBUG: sendtime 2021-03-11
11:06:39.486387+09 receipttime 2021-03-11 11:06:39.486449+09 replication
apply delay 18248 ms transfer latency 1 ms
2021-03-11 11:06:39.486 JST [84342] DEBUG: sending write 0/6000080 flush
0/6000048 apply 0/5FFFFF0
2021-03-11 11:06:39.486 JST [84342] DEBUG: sending write 0/6000080 flush
0/6000080 apply 0/5FFFFF0
2021-03-11 11:06:39.489 JST [84338] LOG: invalid contrecord length 26 at
0/5FFFFF0
2021-03-11 11:06:39.489 JST [84342] FATAL: terminating walreceiver process
due to administrator command
2021-03-11 11:06:39.489 JST [84342] DEBUG: shmem_exit(1): 1
before_shmem_exit callbacks to make
2021-03-11 11:06:39.489 JST [84342] DEBUG: shmem_exit(1): 6 on_shmem_exit
callbacks to make
2021-03-11 11:06:39.489 JST [84342] DEBUG: proc_exit(1): 2 callbacks to
make
2021-03-11 11:06:39.489 JST [84342] DEBUG: exit(1)
2021-03-11 11:06:39.489 JST [84342] DEBUG: shmem_exit(-1): 0
before_shmem_exit callbacks to make
2021-03-11 11:06:39.489 JST [84342] DEBUG: shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2021-03-11 11:06:39.489 JST [84342] DEBUG: proc_exit(-1): 0 callbacks to
make
2021-03-11 11:06:39.489 JST [84338] DEBUG: switched WAL source from stream
to archive after failure
2021-03-11 11:06:39.489 JST [84338] DEBUG: could not open file
"pg_wal/000000020000000000000005": No such file or directory
2021-03-11 11:06:39.489 JST [84338] DEBUG: switched WAL source from archive
to stream after failure
2021-03-11 11:06:39.489 JST [84338] PANIC: could not open file
"pg_wal/000000020000000000000005": No such file or directory
2021-03-11 11:06:39.501 JST [84337] DEBUG: reaping dead processes
2021-03-11 11:06:39.513 JST [84367] DEBUG: find_in_dynamic_libpath: trying
"/home/egashira/pgsql/lib/libpqwalreceiver"
2021-03-11 11:06:39.513 JST [84367] DEBUG: find_in_dynamic_libpath: trying
"/home/egashira/pgsql/lib/libpqwalreceiver.so"
2021-03-11 11:06:39.541 JST [84367] LOG: started streaming WAL from primary
at 0/5000000 on timeline 2
2021-03-11 11:06:39.542 JST [84367] DEBUG: sendtime 2021-03-11
11:06:39.536374+09 receipttime 2021-03-11 11:06:39.54219+09 replication
apply delay 56 ms transfer latency 6 ms
2021-03-11 11:06:39.542 JST [84367] DEBUG: creating and filling new WAL
file
2021-03-11 11:06:39.696 JST [84367] DEBUG: done creating and filling new
WAL file

In addition, this issue may not occur depending on the timing.
When this issue didn't occur, the last WAL record of NODE-A was not
transferred to NODE-B.
I attach the result of pg_waldump for NODE-A, NODE-B and NODE-C.

[Reproduced]
$ pg_waldump nodea/pg_wal/000000010000000000000005 | tail -n 4
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05FFFED0, prev 0/05FFFE88, desc: INSERT off 181 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05FFFF18, prev 0/05FFFED0, desc: INSERT off 182 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05FFFF60, prev 0/05FFFF18, desc: INSERT off 183 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05FFFFA8, prev 0/05FFFF60, desc: INSERT off 184 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759

$ pg_waldump nodeb/pg_wal/000000010000000000000005 | tail -n 4
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05FFFED0, prev 0/05FFFE88, desc: INSERT off 181 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05FFFF18, prev 0/05FFFED0, desc: INSERT off 182 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05FFFF60, prev 0/05FFFF18, desc: INSERT off 183 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05FFFFA8, prev 0/05FFFF60, desc: INSERT off 184 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759

$ pg_waldump nodec/pg_wal/000000010000000000000005 | tail -n 4
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05FFFED0, prev 0/05FFFE88, desc: INSERT off 181 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05FFFF18, prev 0/05FFFED0, desc: INSERT off 182 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05FFFF60, prev 0/05FFFF18, desc: INSERT off 183 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05FFFFA8, prev 0/05FFFF60, desc: INSERT off 184 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759

[Not reproduced]
$ pg_waldump nodea/pg_wal/000000010000000000000005 | tail -n 4
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05FFFED0, prev 0/05FFFE88, desc: INSERT off 181 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05FFFF18, prev 0/05FFFED0, desc: INSERT off 182 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05FFFF60, prev 0/05FFFF18, desc: INSERT off 183 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05FFFFA8, prev 0/05FFFF60, desc: INSERT off 184 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3759

$ pg_waldump nodeb/pg_wal/000000010000000000000005 | tail -n 4
pg_waldump: fatal: error in WAL record at 0/5BFFF88: invalid magic number
0000 in log segment 000000010000000000000005, offset 12582912
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05BFFEB0, prev 0/05BFFE68, desc: INSERT off 2 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3446
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05BFFEF8, prev 0/05BFFEB0, desc: INSERT off 3 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3446
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05BFFF40, prev 0/05BFFEF8, desc: INSERT off 4 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3446
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05BFFF88, prev 0/05BFFF40, desc: INSERT off 5 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3446

$ pg_waldump nodec/pg_wal/000000010000000000000005 | tail -n 4
pg_waldump: fatal: error in WAL record at 0/5BFFF88: invalid magic number
0000 in log segment 000000010000000000000005, offset 12582912
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05BFFEB0, prev 0/05BFFE68, desc: INSERT off 2 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3446
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05BFFEF8, prev 0/05BFFEB0, desc: INSERT off 3 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3446
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05BFFF40, prev 0/05BFFEF8, desc: INSERT off 4 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3446
rmgr: Heap len (rec/tot): 71/ 71, tx: 487, lsn:
0/05BFFF88, prev 0/05BFFF40, desc: INSERT off 5 flags 0x00, blkref #0: rel
1663/16384/16385 blk 3446

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Fujii Masao 2021-03-11 08:25:38 Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
Previous Message Thiagarajan Ramanathan 2021-03-11 07:43:09 Postgres 11.6 RHEL 8 UUID-OSSP not getting installed , while installing contrib.