Quick Links

Problem with hot standby

From:	Michael Blake <postgresql(at)akunno(dot)net>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Problem with hot standby
Date:	2010-12-14 22:12:42
Message-ID:	AANLkTi=Svt92zSdhKTC1H2dFXiqzTfj4E-1NMe0bDD06@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

I'm trying to set up a master/slave server, which initially worked
fine, but recently started failing with the following error:

==============
LOG: database system was interrupted; last known up at [time]
LOG: could not open file "pg_xlog/00000001000000000000002B" (log file
0, segment 43): No such file or directory
LOG: invalid checkpoint record
PANIC: could not locate required checkpoint record
HINT: If you are not restoring from a backup, try removing the file
"/var/lib/postgresql/9.0/main/backup_label".
LOG: startup process (PID 31489) was terminated by signal 6: Aborted
LOG: aborting startup due to startup process failure
==============

This is an Ubuntu 10.04 machine, with all debian default
configurations barring the following changes:

[Primary: postgresql.conf]
wal_level = hot_standby
max_wal_senders = 1
archive_mode = on
archive_command = 'cp -i %p /var/lib/postgresql/export/9.0/main/%f
</dev/null' # Unix
log_statement = 'all'

[Secondary: postgresql.conf]
hot_standby = on

[Secondary: recovery.conf]
standby_mode = 'on'
primary_conninfo = 'host=10.168.60.41 port=5432 user=replication_sys
password=XXXXXXXXX'
restore_command = 'cp /var/lib/postgresql/archive/9.0/main/%f "%p"'
#restore_command = '/usr/lib/postgresql/9.0/bin/pg_standby -c -d -s 2
-t /var/log/pgpool/trigger/trigger_file1
/var/lib/postgresql/archive/9.0/main %p >>
/var/log/postgresql/postgresql-9.0-standby.log.1 1>&2'
#restore_command = '/usr/lib/postgresql/9.0/bin/pg_standby
/var/lib/postgresql/archive/9.0/main %f %p %r'
#archive_cleanup_command = 'pg_archivecleanup
/var/lib/postgresql/archive/9.0/main %r'
#archive_command = 'cp %p /var/lib/postgresql/archive/9.0/main/%f'

The 'archive directory' mentioned above is an NFS mount of the primary
server's /var/lib/postgresql/export/9.0/main directory.
This is working fine, and I can see (in the archive directory on the
recovery server) the pg_xlog file mentioned in the error above.

The script I use to bring a server up to date after failure is as
follows, run as the postgresql user:

================
#!/bin/sh
SERVER=10.168.60.41
VERSION="9.0"
CLUSTER="main"
DEST_CLUSTER="/var/lib/postgresql/$VERSION/$CLUSTER"
ARCHIVE_CLUSTER="/var/lib/postgresql/archive/$VERSION/$CLUSTER"
/etc/init.d/postgresql stop

echo "SELECT pg_start_backup('backup');" | psql --host $SERVER --user
replication_sys template1
rm -rf $DEST_CLUSTER/pg_xlog
# Don't need to ignore postgresql.conf etc as they are in
/etc/postgresql as per debian standard install
rsync -C -a -c --delete -e ssh --exclude pg_log --exclude pg_xlog
--exclude postmaster.pid --exclude postmaster.opts
$SERVER:$DEST_CLUSTER/* $DEST_CLUSTER/
mkdir -p $DEST_CLUSTER/pg_xlog/archive_status
chmod -R 700 $DEST_CLUSTER/pg_xlog
# stop the backup on the master
echo "SELECT pg_stop_backup();" | psql --host $SERVER --user
replication_sys template1
/etc/init.d/postgresql start
================

So I believe I'm doing it right, just can't seem to crack why the
pg_xlog error is happening.

Browse pgsql-general by date

	From	Date	Subject
Next Message	Michael Blake	2010-12-14 22:24:28	Hot Standby pg_xlog problem
Previous Message	Brent Wood	2010-12-14 22:12:12	Re: Simple, free PG GUI/query tool wanted