archive_command called for preallocated/recycled WAL?

From: marty kulma <martykulma(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: archive_command called for preallocated/recycled WAL?
Date: 2016-12-21 14:53:06
Message-ID: CAH9_O4RGeW4uorJAvn6-WBZLdd=9Q_Dp668=LPpZzT2EseRo6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

We are running a PITR tests on Postgresql 9.5.4 (moving up from 9.1.18) and
finding that postgres seems to be archiving an invalid WAL (full of nulls)
in cases when a WAL is preallocated on shutdown. Don't see this issue
without preallocation during restartpoint. We use this process on 9.1.18
and have not seen this issue.

I haven't found mention of this in the mail lists. Has anyone seen this
behavior? Is it safe to remove archived WALs that are null if they overlap
same segment in previous timeline?

System is recovered - replay of WALs completes and postgres is shutdown.
As part of shutdown, restartpoint is created and a new WAL is preallocated.
2016-12-21 09:10:27.458 EST 12079 LOG: database system was
interrupted; last known up at 2016-12-20 06:27:13 EST
2016-12-21 09:10:27.658 EST 12079 LOG: restored log file
"00000008.history" from archive
2016-12-21 09:10:27.833 EST 12079 LOG: entering standby mode
2016-12-21 09:10:27.999 EST 12079 LOG: restored log file
"00000008.history" from archive
2016-12-21 09:10:28.393 EST [local] postgres 12113 FATAL: the database
system is starting up
2016-12-21 09:10:28.405 EST 12079 LOG: restored log file
"000000070000000000000076" from archive
2016-12-21 09:10:28.577 EST 12079 LOG: restored log file
"00000007.history" from archive
2016-12-21 09:10:28.580 EST 12079 LOG: redo starts at 0/760000D0
2016-12-21 09:10:28.582 EST 12079 LOG: consistent recovery state
reached at 0/7600AC70
2016-12-21 09:10:28.583 EST 12077 LOG: database system is ready to
accept read only connections
2016-12-21 09:10:28.977 EST 12079 LOG: restored log file
"000000070000000000000077" from archive
2016-12-21 09:10:29.413 EST 12079 LOG: restored log file
"000000070000000000000078" from archive
2016-12-21 09:10:29.819 EST 12079 LOG: restored log file
"000000070000000000000079" from archive
2016-12-21 09:10:30.059 EST 12079 LOG: restored log file
"00000008000000000000007A" from archive
2016-12-21 09:10:33.296 EST 12175 FATAL: could not connect to the
primary server: could not connect to server: No route to host
Is the server running on host "172.10.10.200" and
accepting
TCP/IP connections on port 5434?

2016-12-21 09:10:37.659 EST 12077 LOG: received fast shutdown request
2016-12-21 09:10:37.660 EST 12077 LOG: aborting any active transactions
2016-12-21 09:10:37.661 EST 12120 LOG: shutting down
2016-12-21 09:10:37.662 EST 12120 LOG: restartpoint starting: shutdown
immediate
2016-12-21 09:10:37.705 EST 12120 LOG: restartpoint complete: wrote 65
buffers (0.0%); 1 transaction log file(s) added, 0 removed, 0 recycled;
write=0.010 s, sync=0.003 s, total=0.043 s; sync files=42, longest=0.000 s,
average=0.000 s; distance=65535 kB, estimate=65535 kB
2016-12-21 09:10:37.705 EST 12120 LOG: recovery restart point at
0/7A0000C8
2016-12-21 09:10:37.705 EST 12120 DETAIL: last completed transaction
was at log time 2016-12-20 07:21:56.032147-05
2016-12-21 09:10:37.705 EST 12120 LOG: database system is shut down

postgres has preallocated 00000008000000000000007B (inode 38797329) as part
of shutdown
38797329 -rw------- 1 postgres postgres 16777216 2016-12-21
09:10:37.696096222 -0500 00000008000000000000007B

Postgres is started and promoted.
2016-12-21 09:10:49.024 EST 12282 LOG: database system was shut down
in recovery at 2016-12-21 09:10:37 EST
2016-12-21 09:10:49.210 EST 12282 LOG: entering standby mode
2016-12-21 09:10:49.400 EST 12282 LOG: restored log file
"00000008.history" from archive
2016-12-21 09:10:49.639 EST 12282 LOG: restored log file
"00000008000000000000007A" from archive
2016-12-21 09:10:49.654 EST 12282 LOG: redo starts at 0/7A0000C8
2016-12-21 09:10:49.654 EST 12282 LOG: consistent recovery state
reached at 0/7B000000
2016-12-21 09:10:49.655 EST 12280 LOG: database system is ready to
accept read only connections
2016-12-21 09:10:49.820 EST 12282 LOG: invalid magic number 0000 in
log segment 00000008000000000000007B, offset 0
2016-12-21 09:10:52.880 EST 12313 FATAL: could not connect to the
primary server: could not connect to server: No route to host
Is the server running on host "172.10.10.200" and
accepting
TCP/IP connections on port 5434?

2016-12-21 09:10:55.548 EST 12282 LOG: received promote request
2016-12-21 09:10:55.548 EST 12350 FATAL: terminating walreceiver
process due to administrator command
2016-12-21 09:10:55.905 EST 12282 LOG: redo done at 0/7A0001A8
2016-12-21 09:10:56.141 EST 12282 LOG: restored log file
"00000008000000000000007A" from archive
2016-12-21 09:10:56.348 EST 12282 LOG: selected new timeline ID: 9
2016-12-21 09:10:56.515 EST 12282 LOG: restored log file
"00000008.history" from archive
2016-12-21 09:10:56.546 EST 12282 LOG: archive recovery complete
2016-12-21 09:10:56.547 EST 12282 LOG: MultiXact member wraparound
protections are now enabled
2016-12-21 09:10:56.547 EST 12304 LOG: checkpoint starting: force

00000008000000000000007B appears to get recycled to
00000009000000000000007A and archived after promote.
38797329 -rw------- 1 postgres postgres 16777216 2016-12-21
09:10:37.696096222 -0500 00000009000000000000007A

We performed a PITR with recovery_target_timeline = latest. PITR
retrieves 00000009000000000000007A, but fails to recover as the history
file and WAL don't line up. Removing 00000009000000000000007A corrects the
issue and PITR succeeds.
2016-12-21 09:19:42.608 EST 13030 LOG: restored log file
"00000009000000000000007A" from archive
2016-12-21 09:19:42.616 EST 13030 LOG: invalid magic number 0000 in
log segment 00000009000000000000007A, offset 0
2016-12-21 09:19:42.616 EST 13030 FATAL: according to history file,
WAL location 0/7A000000 belongs to timeline 7, but previous recovered WAL
file came from timeline 9

# pg_xlogdump 00000007000000000000007A
rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn:
0/7A000028, prev 0/790000D0, desc: CHECKPOINT_SHUTDOWN redo 0/7A000028; tli
7; prev tli 7; fpw true; xid 0/450172; oid 37074; multi 1; offset 0; oldest
xid 1869 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp
xid: 0/0; oldest running xid 0; shutdown
pg_xlogdump: FATAL: error in WAL record at 0/7A000028: invalid record
length at 0/7A000098

# pg_xlogdump 00000008000000000000007A
rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn:
0/7A000028, prev 0/790000D0, desc: CHECKPOINT_SHUTDOWN redo 0/7A000028; tli
7; prev tli 7; fpw true; xid 0/450172; oid 37074; multi 1; offset 0; oldest
xid 1869 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp
xid: 0/0; oldest running xid 0; shutdown
rmgr: XLOG len (rec/tot): 16/ 42, tx: 0, lsn:
0/7A000098, prev 0/7A000028, desc: END_OF_RECOVERY tli 8; prev tli 7; time
2016-12-20 08:39:29.793253 EST
rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn:
0/7A0000C8, prev 0/7A000098, desc: RUNNING_XACTS nextXid 450172
latestCompletedXid 450171 oldestRunningXid 450172
rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn:
0/7A000100, prev 0/7A0000C8, desc: CHECKPOINT_ONLINE redo 0/7A0000C8; tli
8; prev tli 8; fpw true; xid 0/450172; oid 37074; multi 1; offset 0; oldest
xid 1869 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp
xid: 0/0; oldest running xid 450172; online
rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn:
0/7A000170, prev 0/7A000100, desc: RUNNING_XACTS nextXid 450172
latestCompletedXid 450171 oldestRunningXid 450172
rmgr: XLOG len (rec/tot): 0/ 24, tx: 0, lsn:
0/7A0001A8, prev 0/7A000170, desc: SWITCH

# pg_xlogdump 00000009000000000000007A
pg_xlogdump: FATAL: could not find a valid record after 0/7A000000

# cat 00000009.history
1 0/3000098 no recovery target specified

2 0/67000098 no recovery target specified

3 0/6B000098 no recovery target specified

4 0/6E000000 no recovery target specified

5 0/70000098 no recovery target specified

6 0/74000000 no recovery target specified

7 0/7A000098 no recovery target specified

8 0/7B000000 no recovery target specified

-Marty Kulma

Browse pgsql-general by date

  From Date Subject
Next Message Edmundo Robles 2016-12-21 15:12:40 Foreign keys fails with partitioned table.
Previous Message Alban Hertroys 2016-12-21 14:43:34 Re: