From: | PG Bug reporting form <noreply(at)postgresql(dot)org> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Cc: | mb(at)dataegret(dot)com |
Subject: | BUG #15151: Error with wal replay after planned manual switchover. |
Date: | 2018-04-12 12:20:09 |
Message-ID: | 152353560977.31233.11406243320876395727@wrigleys.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 15151
Logged by: Maxim Boguk
Email address: mb(at)dataegret(dot)com
PostgreSQL version: 9.6.8
Operating system: Linux
Description:
Hi,
Today after planned manual switch-over (via remove all load from master,
checkpoint master, wait replica in sync, stop master, stop replica, switch
master and replica roles without promote), I got very strange error on new
replica (log_min_messages=debug5):
2018-04-12 04:23:28 MSK 32059 @ from [vxid: txid:0] [] LOG: database
system was shut down in recovery at 2018-04-12 04:23:26 MSK
2018-04-12 04:23:28 MSK 32059 @ from [vxid: txid:0] [] LOG: entering
standby mode
2018-04-12 04:23:28 MSK 32059 @ from [vxid: txid:0] [] LOG: invalid
resource manager ID 211 at C7BB/EDC42C68
2018-04-12 04:23:28 MSK 32060 rohh(dot)app(at)hh from 192.168.1.2 [vxid: txid:0] []
FATAL: the database system is starting up
2018-04-12 04:23:28 MSK 32061 rohh(dot)app(at)hh from [local] [vxid: txid:0] []
FATAL: the database system is starting up
2018-04-12 04:23:28 MSK 32063 [unknown](at)[unknown] from [local] [vxid:
txid:0] [] LOG: incomplete startup packet
2018-04-12 04:23:28 MSK 32064 postgres(at)postgres from [local] [vxid: txid:0]
[] FATAL: the database system is starting up
2018-04-12 04:23:28 MSK 32062 @ from [vxid: txid:0] [] LOG: started
streaming WAL from primary at C7BB/ED000000 on timeline 1
2018-04-12 04:23:28 MSK 32059 @ from [vxid: txid:0] [] LOG: invalid
resource manager ID 211 at C7BB/EDC42C68
2018-04-12 04:23:30 MSK 32062 @ from [vxid: txid:0] [] FATAL: terminating
walreceiver process due to administrator command
2018-04-12 04:23:30 MSK 32059 @ from [vxid: txid:0] [] LOG: invalid
resource manager ID 211 at C7BB/EDC42C68
2018-04-12 04:23:30 MSK 32059 @ from [vxid: txid:0] [] LOG: invalid
resource manager ID 211 at C7BB/EDC42C68
and so on.
Start with log_min_messages=debug5 - doesn't add any useful information.
Restart of new slave doesn't help.
Stop new slave, manual clean xlog directory on slave and start slave -
doesn't help (e.g. it doesn't look like the single wal file was damaged
during network transfer).
Server offline at this moment and open for experiments for few days.
pg_xlogdump show no errors in wal files:
postgres(at)db-hh3:~$ /usr/lib/postgresql/9.6/bin/pg_xlogdump
--start=C7BB/EDC42C68
/var/lib/postgresql/9.6/main/pg_xlog/000000010000C7BB000000ED | less
first record is after C7BB/EDC42C68, at C7BB/EDC433B8, skipping over 1872
bytes
rmgr: XLOG len (rec/tot): 51/ 2556, tx: 0, lsn:
C7BB/EDC433B8, prev C7BB/EDC42928, desc: FPI_FOR_HINT , blkref #0: rel
1663/16400/18105 blk 326257 FPW
rmgr: XLOG len (rec/tot): 51/ 3140, tx: 0, lsn:
C7BB/EDC43DB8, prev C7BB/EDC433B8, desc: FPI_FOR_HINT , blkref #0: rel
1663/16400/150046445 blk 32094 FPW
rmgr: XLOG len (rec/tot): 51/ 2595, tx: 0, lsn:
C7BB/EDC44A18, prev C7BB/EDC43DB8, desc: FPI_FOR_HINT , blkref #0: rel
1663/16400/18105 blk 326250 FPW
rmgr: XLOG len (rec/tot): 51/ 3130, tx: 0, lsn:
C7BB/EDC45440, prev C7BB/EDC44A18, desc: FPI_FOR_HINT , blkref #0: rel
1663/16400/150046445 blk 15989 FPW
rmgr: XLOG len (rec/tot): 51/ 2590, tx: 0, lsn:
C7BB/EDC46098, prev C7BB/EDC45440, desc: FPI_FOR_HINT , blkref #0: rel
1663/16400/18105 blk 326298 FPW
Few records before C7BB/EDC42C68:
rmgr: XLOG len (rec/tot): 51/ 2520, tx: 0, lsn:
C7BB/EDC41F38, prev C7BB/EDC41348, desc: FPI_FOR_HINT , blkref #0: rel
1663/16400/18105 blk 326155 FPW
rmgr: XLOG len (rec/tot): 51/ 2703, tx: 0, lsn:
C7BB/EDC42928, prev C7BB/EDC41F38, desc: FPI_FOR_HINT , blkref #0: rel
1663/16400/18105 blk 326262 FPW
rmgr: XLOG len (rec/tot): 51/ 2556, tx: 0, lsn:
C7BB/EDC433B8, prev C7BB/EDC42928, desc: FPI_FOR_HINT , blkref #0: rel
1663/16400/18105 blk 326257 FPW
rmgr: XLOG len (rec/tot): 51/ 3140, tx: 0, lsn:
C7BB/EDC43DB8, prev C7BB/EDC433B8, desc: FPI_FOR_HINT , blkref #0: rel
1663/16400/150046445 blk 32094 FPW
rmgr: XLOG len (rec/tot): 51/ 2595, tx: 0, lsn:
C7BB/EDC44A18, prev C7BB/EDC43DB8, desc: FPI_FOR_HINT , blkref #0: rel
1663/16400/18105 blk 326250 FPW
pg_xlogdump show no visible to me anomalies in 000000010000C7BB000000ED or
000000010000C7BB000000EC files.
Any additional information I'll try provide, but my experience with internal
wal structure outside of using pg_xlogdump output very limited.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2018-04-12 22:14:03 | Re: BUG #15150: Reading uninitialised value in NISortAffixes (tsearch/spell.c) |
Previous Message | Arthur Zakirov | 2018-04-12 11:56:39 | Re: BUG #15150: Reading uninitialised value in NISortAffixes (tsearch/spell.c) |