Re: purpose/use of pg_start_backup() pg_stop_backup()

From: "Andrew W(dot) Gibbs" <awgibbs(at)awgibbs(dot)com>
To: Jim Mercer <jim(at)reptiles(dot)org>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: purpose/use of pg_start_backup() pg_stop_backup()
Date: 2013-04-06 18:07:11
Message-ID: 20130406180711.GA27333@raptor.commandosoftware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

These functions aren't actually doing anything overly magical. They
don't cause you to end up with a "nice clean snapshot", at least not
right away. They're mostly just helping with the housekeeping tasks of
making sure that in the future you have kept around the WAL segments
that will be necessary to perform a replay of WAL segments to a
consistent state.

The pg_start_backup function is forcing a checkpoint in the database,
while the pg_stop_backup function is executing an immediate WAL
segment switch so that it can write the current one to the archive.
The pg_start_backup function also writes a backup_label file to disk
that can be used by a future system recovering from your file system
archive to know from where to start replaying WAL segments to reach a
consistent state. You want to wait for pg_start_backup to finish
before you start making your archive and wait until it is finished
before invoking pg_stop_backup (which itself will block until it is
satisfied that WAL segments have been archived).

Another piece to all this is putting a recovery.conf file in the
unpacked directory that provides a restore_command that tells the
system how to retrieve a WAL segment. Ordinarily when a PostgreSQL
system experiences some kind of crash and has to regain a consistent
state, it does so by replaying WAL segments from a safe "redo"
location (relying on the properties of WAL segment application), and
it retrieves these segments from pg_xlog. When you're recovering a
system from an archive, however, you'll want to have created the
archive without capturing the pg_xlog directory, and furthermore
(obviously) time has marched on from the point that you did the
archive, so necessary segments won't be there. Consequently you have
to tell the system how to request WAL segments. And you'll need to
have been previously putting these segments somewhere by configuring
your master database to archive them by setting archive_mode and
archive_command.

The documentation is fairly complete, but there are a few places where
it's critical to be careful, so give it a thorough read-through a few
times. Also make sure that you regularly perform restore operations
and have good tests to convince yourself that it all works.

Meanwhile, I recently found myself in the position of wanting to
perform base backups off of a PostgreSQL-8.4 server that was in warm
standby so as to take that I/O strain off of the production server
after having set up PITR. I was told "not possible", but with some
deep dives into the documentation and some testing, I've ended up with
procedures that work and are reasonably simple:

* I invoke the pg_control shell command and extract the "redo" location
* I copy a snapshot of ${PGDATA}/global/pg_control to ${PGDATA}/global/pg_control_snapshot_xxxxxx
* I create a tarball of ${PGDATA} and name the archive in a way that includes the redo location's address
* When I do an archive restore, I copy the snapshotted pg_control file back to where it should be; the recovery.conf file is already conveniently there
* When I am managing my WAL segment archive, I do so with a script that is invoked with an archive's name and that trims only WAL segments before the redo location

I had to do this because you cannot run the pg_start_backup and
pg_stop_backup commands on a warm standby. Hot standby is not an
option until 9.x and I haven't yet had the luxury to make that jump.

So, not magic... The two really crucial pieces of this are:

* by capturing the redo location and snapshotting the pg_control file, at worst you make Postgres replay from WAL segments further back than was necessary
* the pg_control file is 8K and written atomically; as I understand, this is necessary to ensure that PostgreSQL is recoverable from power loss or crashes

For anyone else reading this thread, consider it a follow up for my
query back in November...

http://www.postgresql.org/message-id/20121129015217.GA9975@raptor.commandosoftware.com

Repeated tests seem to indicate that the procedure is sound, but I'd
love to hear if anyone thinks this is preposterous, since I've got a
production system relying on it and I am aware that absence of proof
does not constitute proof of absence.

I initially got really confusing results when I tried to tell tar to
capture pg_control first, and thought I had, but hadn't actually. The
database _seemingly_ recovered to a consistent state, even to the
point of accepting connections and having _some_ queries run
correctly, but for certain SELECT queries the system would complain
about missing pg_clog files, which at first had me scared because I
found old posts about PostgreSQL having some bugs pertaining to the
management of pg_clog with WAL, but I later realized my pg_control
snapshotting failure, and I haven't since seen the pg_clog issue,
which I hope means that I am in the clear.

But, Jim, if you're on the 9.x series, or if you're taking your
regular dumps from the master system, then you don't have to worry
overly much about the latter part of this email, except in that it
might help demystify some of what PostgreSQL is doing for you.

-- AWG

On Sat, Apr 06, 2013 at 12:46:05PM -0400, Jim Mercer wrote:
>
> on the surface, the functions pg_start_backup() and pg_stop_backup() seem to
> indicate something very useful.
>
> as i understand it, i could do:
>
> psql -c "pg_start_backup('$(date +%Y%m%d%H%M%S)');"
> snapshot/dump $PGDATA
> psql -c "pg_stop_backup();"
>
> and this would then create a nice clean snapshot.
>
> however, i'm trying to work out what the restore process is.
>
> its not clear to me if one needs to do anything after restoring the dump.
>
> or, if there is some sort of incremental process that needs to be followed
> after the dump.
>
> can someone provide a walk-through of using these functions both for
> a backup and a restore?
>
> --
> Jim Mercer Reptilian Research jim(at)reptiles(dot)org +1 416 410-5633
> "He who dies with the most toys is nonetheless dead"
>
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Nik Tek 2013-04-07 04:59:16 Find how much memory is postgres using
Previous Message Ray Stell 2013-04-06 17:38:31 Re: purpose/use of pg_start_backup() pg_stop_backup()