Re: Cannot rebuild a standby server

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: John Scalia <jayknowsunix(at)gmail(dot)com>, "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Cannot rebuild a standby server
Date: 2014-06-20 20:36:09
Message-ID: 1403296569.12353.YahooMailNeo@web122303.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

John Scalia <jayknowsunix(at)gmail(dot)com> wrote:
> On 6/20/2014 2:09 PM, Kevin Grittner wrote:

>> The documentation is your friend.  It gives pretty specific
>> instructions for what to do.

> Well, I did finally get it working by adding -X s -c fast to the
> pg_basebackup command.

Hopefully you did not follow that with a copy of pg_xlog files from
a running server.

> Kevin, if I didn't copy WALs over, the database still refused to
> start as it claimed it was looking for a one of the first
> specific files.

You absolutely need WAL files for recovery.  You absolutely should
not copy the files ad hoc from a running cluster, as they may be in
flux at the time of the copy.  The archive_command is called at the
right point in time to get a good copy, and using the pg_basebackup
command will also acquire them in a safe way.

> Also, I've not seen any references to removing certain files like
> a backup_label file in the standby's data directory causing
> problems.

On this page:

http://www.postgresql.org/docs/current/static/continuous-archiving.html

... it says:

| pg_start_backup creates a backup label file, called backup_label,
| in the cluster directory with information about your backup,
| including the start time and label string. The file is critical
| to the integrity of the backup, should you need to restore from
| it.

The same page later says:

| It's also worth noting that the pg_start_backup function makes a
| file named backup_label in the database cluster directory, which
| is removed by pg_stop_backup. This file will of course be
| archived as a part of your backup dump file. The backup label
| file includes the label string you gave to pg_start_backup, as
| well as the time at which pg_start_backup was run, and the name
| of the starting WAL file. In case of confusion it is therefore
| possible to look inside a backup dump file and determine exactly
| which backup session the dump file came from. However, this file
| is not merely for your information; its presence and contents are
| critical to the proper operation of the system's recovery
| process.

These are related to the pg_start_backup function and apply whether
you run that function directly or use pg_basebackup, which runs it.

> The other files I removed were the old postgresql.pid file from
> the primary

You should remove that, or exclude it from your backup in the first
place.

> and a file called archiving_active, which I use for controlling
> whether postgresql writes WAL files or not.

That's based on your particular archiving strategy; without more
info on that I can't comment.

> Seems a little funny to me that I've done this same procedure for
> over 4 months with no problems, and today was the first time it
> bit me.

It's sorta like crossing a highway without looking either way.  It
might work many times.  Then again, a Mac truck might be coming
just that one time.  If you would like to have it work
consistently, rather than gamble each time, don't copy WAL files
from the pg_xlog directory of a live server except through the
documented methods, and don't delete the backup_label file.  If you
do either of those things you risks errors like you just saw.
Worse, you risk not getting errors but having a silently corrupted
database.  Unless you enjoy the thrill of it....  ;-)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Oliver 2014-06-23 08:31:06 Re: Postgresql not getting assigned memory
Previous Message John Scalia 2014-06-20 19:00:56 Re: Cannot rebuild a standby server