base backup/restore + streaming replication => weirdness

From: domehead100 <domehead100(at)gmail(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: base backup/restore + streaming replication => weirdness
Date: 2013-02-22 22:11:03
Message-ID: 1361571063448-5746342.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

I have a smallish Postgres 9.0 database with Primary and Standby instances.

These instances are set up with streaming replication from the Primary to
the Standby. The primary archives WAL files to a shared directory that is
accessible from the Standby. This is a hot standby, so transactions are
received over TCP.

We had an issue this week where the shared directory where WAL files were
being archived (/pgsql_wal) ran out of space.

To restart replication, I performed a base backup on Primary (tar $PGDATA to
/pgsql_wal) and then performed a base restore (untar) on Standby.

After this, the Standby is staying in recovery mode (recovery.conf never
gets changed to recovery.done), and my check_replication.sh script shows
strange results. The sequence number for the Primary (first item below) is
totally different from either the received or applied sequence numbers on
the Standby.

Primary:
pg_current_xlog_location
--------------------------
1E/D5C40A40 <= this looks strange
(1 row)

Standby, last received:
pg_last_xlog_receive_location
-------------------------------
E/BF68BD08
(1 row)

Standby, last applied:
pg_last_xlog_replay_location
------------------------------
E/BF68BD08
(1 row)

I can connect to the Standby, and a select query seems to indicate that the
databases are in sync (they return the same value for max(<primary_key>) on
a table that is constantly receiving inserts).

One concern is that my tar command apparently did not exclude the files in
$PGDATA/pg_xlog, so those got untarred on the Standby. Could that be a
problem?

Here's my basebackup.sh:
#! /bin/sh
# Base Backup script for streaming replication

BACKUP_FILE=/pgsql_wal/backup/pg_base_backup.tgz

psql -c "SELECT pg_start_backup('$BACKUP_FILE', true)" postgres

rm -rf $BACKUP_FILE

nice -n 10 tar czvpf $BACKUP_FILE --exclude={"$PGDATA/pg_xlog/*"} $PGDATA

psql -c "SELECT pg_stop_backup()" postgres

And here's my baserestore.h:
#! /bin/sh
# Base Recovery script for streaming replication (run on Standby)
# Run as postgres user
# Postgres should be stopped

DATE=`date +%Y_%M_%d`
CONF_BACKUP_DIR=/tmp/pgsql_conf_backup_$DATE
BASE_BACKUP_FILE=/pgsql_wal/backup/pg_base_backup.tgz

#backup config files
mkdir $CONF_BACKUP_DIR
cp $PGDATA/*.conf $CONF_BACKUP_DIR
cp $PGDATA/recovery.done $CONF_BACKUP_DIR

#blow away existing data directory
rm -rf $PGDATA

#untar base backup file
cd /
tar xzvf $BASE_BACKUP_FILE

#copy configs back
cp $CONF_BACKUP_DIR/*.conf $PGDATA
cp $CONF_BACKUP_DIR/recovery.done $PGDATA/recovery.conf

--
View this message in context: http://postgresql.1045698.n5.nabble.com/base-backup-restore-streaming-replication-weirdness-tp5746342.html
Sent from the PostgreSQL - admin mailing list archive at Nabble.com.

Browse pgsql-admin by date

  From Date Subject
Next Message Charles Sprickman 2013-02-23 05:55:07 logging full queries separately
Previous Message Ned Wolpert 2013-02-22 16:59:27 Re: Database corruption event, unlockable rows, possibly bogus virtual xids? (-1/4444444444)