Unexpected WAL-archive restore behaviour

From: Nikolay Petrov <nik(dot)petrov(dot)ua(at)yandex(dot)ru>
To: pgsql-general(at)postgresql(dot)org
Subject: Unexpected WAL-archive restore behaviour
Date: 2017-02-18 21:01:42
Message-ID: 76D18D07-2BF2-4E8E-B1FD-7CA861F48645@yandex.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello everyone.
I use postgresql 9.6.1 running under a debian 8 (master - slave hot standby streaming replication).
I found out unexpected WAL-archive restore behaviour.

master:
wal_level = replica
archive_mode = on
archive_command = 'gzip < %p > /var/lib/postgres/archives/%f'

slave recovery.conf:
standby_mode = 'on'
primary_conninfo = 'host=192.168.50.96 port=5432 user=replication password=password'
primary_slot_name = 'myslot'
restore_command = 'ssh postgres(at)192(dot)168(dot)50(dot)96 \'cat /var/lib/postgres/archives/%f\' | gzip -d -c > %p'
recovery_target_timeline = 'latest'

It was necessary to add replication delay (recovery_min_apply_delay = 30min).

Hot standby server was started with new recovery.conf, reached consistent recovery state and resumed to restore WAL segments from archive. When couple of next wal segments became unavailable slave server just "skipped" them, instead to start streaming WAL from primary:

2017-02-16 10:54:22 MSK [5956-1] LOG: database system was shut down in recovery at 2017-02-16 10:19:01 MSK
2017-02-16 10:54:22 MSK [5957-1] [unknown](at)[unknown] LOG: incomplete startup packet
cat: /var/lib/postgres/archives/00000003.history: No such file or directory

gzip: stdin: unexpected end of file
2017-02-16 10:54:22 MSK [5956-2] LOG: entering standby mode
cat: /var/lib/postgres/archives/00000002.history: No such file or directory

gzip: stdin: unexpected end of file
2017-02-16 10:54:22 MSK [5969-1] postgres(at)postgres FATAL: the database system is starting up
2017-02-16 10:54:23 MSK [5956-3] LOG: restored log file "00000002000012A2000000D0" from archive
2017-02-16 10:54:23 MSK [5977-1] postgres(at)postgres FATAL: the database system is starting up
2017-02-16 10:54:23 MSK [5956-4] LOG: restored log file "00000002000012A2000000B5" from archive
2017-02-16 10:54:23 MSK [5956-5] LOG: redo starts at 12A2/B5ECC5C0
2017-02-16 10:54:23 MSK [5956-6] LOG: restored log file "00000002000012A2000000B6" from archive

...

2017-02-16 10:54:48 MSK [5956-55] LOG: restored log file "00000002000012A2000000E7" from archive
2017-02-16 10:54:48 MSK [5956-56] LOG: restored log file "00000002000012A2000000E8" from archive
2017-02-16 10:54:48 MSK [5956-57] LOG: consistent recovery state reached at 12A2/E81D86C8
2017-02-16 10:54:48 MSK [5954-3] LOG: database system is ready to accept read only connections
2017-02-16 10:55:07 MSK [6233-1] [unknown](at)[unknown] LOG: incomplete startup packet

...

2017-02-16 11:27:37 MSK [5956-115] LOG: restored log file "00000002000012A300000022" from archive
2017-02-16 11:27:38 MSK [5956-116] LOG: restored log file "00000002000012A300000023" from archive
2017-02-16 11:27:59 MSK [5956-117] LOG: restored log file "00000002000012A300000024" from archive
cat: /var/lib/postgres/archives/00000002000012A300000025: No such file or directory

gzip: stdin: unexpected end of file
cat: /var/lib/postgres/archives/00000002000012A300000026: No such file or directory

gzip: stdin: unexpected end of file
cat: /var/lib/postgres/archives/00000002000012A300000027: No such file or directory

gzip: stdin: unexpected end of file
2017-02-16 11:28:37 MSK [5956-118] LOG: restored log file "00000002000012A300000028" from archive
2017-02-16 11:28:39 MSK [5956-119] LOG: restored log file "00000002000012A300000029" from archive
2017-02-16 11:28:59 MSK [5956-120] LOG: restored log file "00000002000012A30000002A" from archive

"Skipped" WAL segments looks like normal WAL archives:

postgres(at)myhost:/var/lib/postgres/archives/tmpwal$ /usr/lib/postgresql/9.6/bin/pg_xlogdump -z ./00000002000012A300000026
Type N (%) Record size (%) FPI size (%) Combined size (%)
---- - --- ----------- --- -------- --- ------------- ---
XLOG 0 (0.00) 0 (0.00) 0 (0.00) 0 (0.00)
Transaction 34 (0.43) 1472 (0.56) 0 (0.00) 1472 (0.01)
Storage 0 (0.00) 0 (0.00) 0 (0.00) 0 (0.00)
CLOG 0 (0.00) 0 (0.00) 0 (0.00) 0 (0.00)
Database 0 (0.00) 0 (0.00) 0 (0.00) 0 (0.00)
Tablespace 0 (0.00) 0 (0.00) 0 (0.00) 0 (0.00)
MultiXact 0 (0.00) 0 (0.00) 0 (0.00) 0 (0.00)
RelMap 0 (0.00) 0 (0.00) 0 (0.00) 0 (0.00)
Standby 2 (0.03) 184 (0.07) 0 (0.00) 184 (0.00)
Heap2 673 (8.51) 21544 (8.25) 2057620 (12.90) 2079164 (12.83)
Heap 4854 (61.35) 171361 (65.59) 7642192 (47.92) 7813553 (48.20)
Btree 2330 (29.45) 63248 (24.21) 6248364 (39.18) 6311612 (38.94)
Hash 0 (0.00) 0 (0.00) 0 (0.00) 0 (0.00)
Gin 0 (0.00) 0 (0.00) 0 (0.00) 0 (0.00)
Gist 0 (0.00) 0 (0.00) 0 (0.00) 0 (0.00)
Sequence 19 (0.24) 3458 (1.32) 0 (0.00) 3458 (0.02)
SPGist 0 (0.00) 0 (0.00) 0 (0.00) 0 (0.00)
BRIN 0 (0.00) 0 (0.00) 0 (0.00) 0 (0.00)
CommitTs 0 (0.00) 0 (0.00) 0 (0.00) 0 (0.00)
ReplicationOrigin 0 (0.00) 0 (0.00) 0 (0.00) 0 (0.00)
Generic 0 (0.00) 0 (0.00) 0 (0.00) 0 (0.00)
LogicalMessage 0 (0.00) 0 (0.00) 0 (0.00) 0 (0.00)
-------- -------- -------- --------
Total 7912 261267 [1.61%] 15948176 [98.39%] 16209443 [100%]

Is it normal behaviour? Can it cause standby consistency damage?
Thank you.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tomas Vondra 2017-02-18 23:06:43 Re: Unexpected WAL-archive restore behaviour
Previous Message David G. Johnston 2017-02-18 16:38:52 Re: [GENERAL] How tö select a column?