Quick Links

WAL ends before end time of backup dump

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	WAL ends before end time of backup dump
Date:	2006-11-07 19:15:38
Message-ID:	1162926938.31124.411.camel@dogma.v10.wvs
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Version: 8.1.4

I am having a problem restoring one of my base backups. I took a
successful backup of the production DB already since this one, and this
is just a routine test, so it's fortunately not an emergency.

I think that I either have a corrupted base backup or corrupted WAL
segments, or maybe I hit some strange bug.

When I try to restore, I point recovery.conf to the full set of archived
WAL segments, and get the following result:

[snip]
LOG: restored log file "00000001000000170000002B.004A3CAC.backup" from
archive
LOG: restored log file "00000001000000170000002B" from archive
LOG: checkpoint record is at 17/2B4CDC58
LOG: redo record is at 17/2B4A3CAC; undo record is at 0/0; shutdown
FALSE
LOG: next transaction ID: 41438715; next OID: 42280
LOG: next MultiXactId: 1; next MultiXactOffset: 0
LOG: automatic recovery in progress
LOG: redo starts at 17/2B4A3CAC
LOG: record with zero length at 17/2B6EACC8
LOG: redo done at 17/2B6EAC84
LOG: restored log file "00000001000000170000002B" from archive
PANIC: WAL ends before end time of backup dump
LOG: startup process (PID 88979) was terminated by signal 6
LOG: aborting startup due to startup process failure

If I restore from the earlier base backup, which grinds slowly through a
week's worth of WAL segments, it stops at segment
00000001000000170000002B, like so:

[ snip ]
LOG: restored log file "00000001000000170000002B" from archive
LOG: record with zero length at 17/2B6EACC8
LOG: redo done at 17/2B6EAC84
LOG: restored log file "00000001000000170000002B" from archive
LOG: archive recovery complete
LOG: database system is ready
LOG: transaction ID wrap limit is 1094453440, limited by database
"postgres"

If I restore from a later backup, everything works fine.

The thing that stands out to me about the base backup that doesn't work
is that it took several WAL segments to complete. Here's the .backup
file for the base backup that fails:

$ cat wal/00000001000000170000002B.004A3CAC.backup
START WAL LOCATION: 17/2B4A3CAC (file 00000001000000170000002B)
STOP WAL LOCATION: 17/397B7D64 (file 000000010000001700000039)
CHECKPOINT LOCATION: 17/2B4CDC58
START TIME: 2006-11-05 01:00:01 PST
LABEL: 20061105010001.27375.tar.gz
STOP TIME: 2006-11-05 01:14:03 PST

I noticed the 8.2beta3 included a fix for WAL replay, is that related?
Can someone link to the thread about that bug? I can't test newer
versions of postgres because all my other backups seem to work.

Basically, I'd just like to know what happened to prevent it in the
future. I am archiving to an NFS mount, I don't know whether that
carries a risk of corruption or not.

Regards,
Jeff Davis

Responses

Re: WAL ends before end time of backup dump at 2006-11-07 22:20:56 from Tom Lane
Re: WAL ends before end time of backup dump at 2006-11-07 22:24:19 from Jeff Davis

Browse pgsql-general by date

	From	Date	Subject
Next Message	Merlin Moncure	2006-11-07 19:47:01	Re: I'm lost :-( with FOR...IN
Previous Message	Alain Roger	2006-11-07 19:09:56	I'm lost :-( with FOR...IN