From: | Kevin Grittner <kgrittn(at)ymail(dot)com> |
---|---|
To: | John Scalia <jayknowsunix(at)gmail(dot)com>, "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org> |
Subject: | Re: Cannot rebuild a standby server |
Date: | 2014-06-20 18:09:16 |
Message-ID: | 1403287756.14509.YahooMailNeo@web122305.mail.ne1.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
John Scalia <jayknowsunix(at)gmail(dot)com> wrote:
> In the true definition of insanity, I've tried to rebuild a standby
> streaming replication server using the following steps several times:
>
> 1) ensure the postgresql data directory, /var/lib/pgsql/9.3/data, is empty.
> 2) run: pg_basebackup -h <primary server> -D /var/lib/pgsql/9.3/data
> 3) manually copy the WAL's from the primary server's pg_xlog directory
> to the directory specified in the standby's recovery.conf restore_command.
Step 3 is enough to cause database corruption on the replica.
> 4) rm any artifacts from the standby's new data directory like the
> backup_label file.
So is that.
> 5) copy the saved recovery.conf into the standby's data directory and check
> it is accurate.
> 6) Start the database using "service postgresql-9.3 start"
>
> Every time, however, the following appears in the pg_log/postgresql-Fri.log:
> <timestamp> LOG: entering standby mode
> <timestamp> LOG: restored log file "00000003.history"
> <timestamp> LOG: invalid secondary checkpoint record
> <timestamp> PANIC: could not locate a valid checkpoint record
Yep, that's about the best result you can expect with the above
procedure; it is also occasionally possible to get it to start, but
if it did there would almost certainly be data loss or corruption.
> All this was originally caused by testing the failover mechanism in pgpool. That
> didn't succeed and I'm trying to get the servers back to their original
> states. I've done this kind
> of thing before, but don't know what's wrong with this effort. What have
> I missed?
You should enable WAL archiving and the restore_command in
recovery.conf should copy WAL files from the archive. The pg_xlog
directory should be empty when starting recovery unless the primary
is stopped and you only copy pg_xlog files from the stopped server
into the pg_xlog directory of the recovery cluster. Don't delete
the backup_label file, because it has the information recovery
needs about the point from which it should start WAL replay --
without it, it will have to guess, and is very likely to get that
wrong.
The documentation is your friend. It gives pretty specific
instructions for what to do.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | John Scalia | 2014-06-20 19:00:56 | Re: Cannot rebuild a standby server |
Previous Message | Kevin Grittner | 2014-06-20 17:54:43 | Re: PostgreSQL db |