From: | bohmer(at)visionlink(dot)org |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog |
Date: | 2013-04-05 19:27:29 |
Message-ID: | E1UOCIL-0004ll-Qt@wrigleys.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
The following bug has been logged on the website:
Bug reference: 8043
Logged by: Jeff Bohmer
Email address: bohmer(at)visionlink(dot)org
PostgreSQL version: 9.2.4
Operating system: CentOS 5.9 x86_64 kernel 2.6.18-348.3.1.el5
Description:
Hi pgsql-bugs,
PG version: 9.2.4, from yum.postgresql.org
OS: CentOS 5.9 x86_64 kernel 2.6.18-348.3.1.el5
Upgrading from 9.2.3 to 9.2.4 has broken archive recovery for me. Probably
related to this 9.2.4 change:
Ensure we do crash recovery before entering archive recovery,
if the database was not stopped cleanly and a recovery.conf
file is present (Heikki Linnakangas, Kyotaro Horiguchi,
Mitsumasa Kondo)
When starting the PostgreSQL 9.2.4 service on a base backup, I get this:
2013-04-05 12:49:04 MDT [10302]: [1-1] user=,db= LOG: database system was
interrupted; last known up at 2013-04-05 10:02:01 MDT
2013-04-05 12:49:04 MDT [10302]: [2-1] user=,db= LOG: starting archive
recovery
2013-04-05 12:49:04 MDT [10302]: [3-1] user=,db= LOG: could not open file
"pg_xlog/000000010000002F0000002D" (log file 47, segment 45): No such file
or directory
2013-04-05 12:49:04 MDT [10302]: [4-1] user=,db= LOG: invalid primary
checkpoint record
2013-04-05 12:49:04 MDT [10302]: [5-1] user=,db= LOG: could not open file
"pg_xlog/000000010000002F0000002C" (log file 47, segment 44): No such file
or directory
2013-04-05 12:49:04 MDT [10302]: [6-1] user=,db= LOG: invalid secondary
checkpoint record
2013-04-05 12:49:04 MDT [10302]: [7-1] user=,db= PANIC: could not locate a
valid checkpoint record
2013-04-05 12:49:04 MDT [10297]: [2-1] user=,db= LOG: startup process (PID
10302) was terminated by signal 6: Aborted
2013-04-05 12:49:04 MDT [10297]: [3-1] user=,db= LOG: aborting startup due
to startup process failure
The WAL file 000000010000002F0000002D does exist in my WAL archive, but not
in pg_xlog. I exclude pg_xlog files when taking the base backup, per the
instructions from
<http://www.postgresql.org/docs/9.2/static/continuous-archiving.html>:
You can, however, omit from the backup dump the files within
the cluster's pg_xlog/ subdirectory. This slight adjustment
is worthwhile because it reduces the risk of mistakes when
restoring.
I use a custom base backup script to call pg_start/stop_backup() and make
the backup with rsync.
The restore_command in recovery.conf is never called by PG 9.2.4 during
startup. I confirmed this by adding a "touch /tmp/restore_command.`date
+%H:%M:%S`" line at the beginning of the shell script I use for my
restore_command. No such files are created when starting PG 9.2.4.
After downgrading back to 9.2.3, archive recovery works using the very same
base backup, recovery.conf file, and restore_command. The log indicates that
PG 9.2.3 begins recovery by pulling WAL files from the archive instead of
pg_xlog:
2013-04-05 13:01:14 MDT [16981]: [1-1] user=,db= LOG: database system was
interrupted; last known up at 2013-04-05 10:02:01 MDT
2013-04-05 13:01:14 MDT [16981]: [2-1] user=,db= LOG: starting archive
recovery
2013-04-05 13:01:14 MDT [16981]: [3-1] user=,db= LOG: restored log file
"000000010000002F0000002D" from archive
2013-04-05 13:01:14 MDT [16981]: [4-1] user=,db= LOG: consistent recovery
state reached at 2F/2D000080
2013-04-05 13:01:14 MDT [16981]: [5-1] user=,db= LOG: redo starts at
2F/2D000080
2013-04-05 13:01:15 MDT [16981]: [6-1] user=,db= LOG: restored log file
"000000010000002F0000002E" from archive
2013-04-05 13:01:15 MDT [16981]: [7-1] user=,db= LOG: restored log file
"000000010000002F0000002F" from archive
<snip>
2013-04-05 13:01:17 MDT [16981]: [25-1] user=,db= LOG: redo done at
2F/3F07B4D0
2013-04-05 13:01:17 MDT [16981]: [26-1] user=,db= LOG: last completed
transaction was at log time 2013-04-05 12:53:01.069826-06
2013-04-05 13:01:17 MDT [16981]: [27-1] user=,db= LOG: restored log file
"000000010000002F0000003F" from archive
2013-04-05 13:01:17 MDT [16981]: [28-1] user=,db= LOG: selected new
timeline ID: 2
2013-04-05 13:01:17 MDT [16981]: [29-1] user=,db= LOG: archive recovery
complete
2013-04-05 13:01:17 MDT [16991]: [1-1] user=,db= LOG: checkpoint starting:
end-of-recovery immediate wait
2013-04-05 13:01:17 MDT [16991]: [2-1] user=,db= LOG: checkpoint complete:
wrote 327 buffers (0.1%); 0 transaction log file(s) ad
ded, 0 removed, 0 recycled; write=0.015 s, sync=0.000 s, total=0.035 s; sync
files=0, longest=0.000 s, average=0.000 s
2013-04-05 13:01:17 MDT [16978]: [2-1] user=,db= LOG: database system is
ready to accept connections
So, the behavior has definitely changed between 9.2.3 and 9.2.4. Is this a
bug in 9.2.4?
Or, must I now include pg_xlog files when taking base backups with 9.2.4,
contrary to the documentation?
Or, is there a way to make PG 9.2.4 use restore_command for recovery, as
9.2.3 does?
Thank you,
- Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2013-04-05 22:02:28 | Re: BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog |
Previous Message | Sergey Konoplev | 2013-04-05 14:44:44 | Re: Completely broken replica after PANIC: WAL contains references to invalid pages |
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Smith | 2013-04-05 19:39:51 | Re: Enabling Checksums |
Previous Message | Robert Haas | 2013-04-05 19:08:10 | Re: Page replacement algorithm in buffer cache |