From: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | [BUG] Archive recovery failure on 9.3+. |
Date: | 2013-12-12 02:00:02 |
Message-ID: | 20131212.110002.204892575.horiguchi.kyotaro@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello, we happened to see server crash on archive recovery under
some condition.
After TLI was incremented, there should be the case that the WAL
file for older timeline is archived but not for that of the same
segment id but for newer timeline. Archive recovery should fail
for the case with PANIC error like follows,
| PANIC: record with zero length at 0/1820D40
Replay script is attached. This issue occured for 9.4dev, 9.3.2,
and not for 9.2.6 and 9.1.11. The latter search pg_xlog for the
TLI before trying archive for older TLIs.
This occurrs during fetching checkpoint redo record in archive
recovery.
> if (checkPoint.redo < RecPtr)
> {
> /* back up to find the record */
> record = ReadRecord(xlogreader, checkPoint.redo, PANIC, false);
And this is caused by that the segment file for older timeline in
archive directory is preferred to that for newer timeline in
pg_xlog.
Looking into pg_xlog before trying the older TLIs in archive like
9.2- fixes this issue. The attached patch is one possible
solution for 9.4dev.
Attached files are,
- recvtest.sh: Replay script. Step 1 and 2 makes the condition
and step 3 causes the issue.
- archrecvfix_20131212.patch: The patch fixes the issue. Archive
recovery reads pg_xlog before trying older TLI in archive
similarly to 9.1- by this patch.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
unknown_filename | text/plain | 1.7 KB |
archrecvfix_20131212.patch | text/x-patch | 849 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2013-12-12 02:29:43 | Re: SSL: better default ciphersuite |
Previous Message | Andres Freund | 2013-12-12 01:23:30 | Re: pgsql: Fix a couple of bugs in MultiXactId freezing |