Quick Links

Re: Issues Outstanding for Point In Time Recovery (PITR)

From:	Richard Tucker <richt(at)multera(dot)com>
To:	"J(dot) R(dot) Nield" <jrnield(at)usol(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	PostgreSQL Hacker <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Issues Outstanding for Point In Time Recovery (PITR)
Date:	2002-07-17 21:45:53
Message-ID:	EKEKLEKKLDAEEKDBDMMAEEOBCCAA.richt@multera.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

We also have implemented a roll forward recovery mechanism. We modified a
7.2.1 version of Postgres.
The mechanism is designed to provide a means of recoverying from the loss or
corruption of media. It provides for duplicating wal_files so that if a
wal_file is lost roll forward recovery can recover the database using the
duplicated wal_files. Hooks were also added so that the roll forward
recovery mechanism can be used to implement a hot standby database.
Along with the roll forward recovery mechanism we have also implemented an
online database backup utility which is synchronized with the recovery log
so that the backup can be a starting point for a roll forward recovery
session.

Roll forward recovery is enabled for a database cluster by specifying one of
two postmaster configuration parameters.

wal_file_reuse = false This parameter tells the wal system to not reuse wal
files. This option is intended for sites wishing to implement a hot standby
database where wal files will be periodically copied to another machine
where they will be rolled forward into the standby database.

wal_file_duplicate = <directory_path> This parameter tells the system to
mirror the files in the pg_xlog directory to the specified directory. This
allows for the recovery of a database where a wal file has been damaged or
lost. It also allows for a variant of a hot standby database where the
duplicate directory is the pg_xlog directory of the standby database.

Since both of these options cause wal files to accumulate indefinately the
dba needs a means of purging wal files when they are no longer needed. So
an sql command, "ALTER SYSTEM PURGE WAL_FILES <wal_file_name>", has also
been implemented. This command deletes all wal files up to and including
the specified <wal_file_name> as long as those wal files are not needed to
recover the database in the event of a system crash. To find out the status
of the wal files a function has been implemented to return a wal file name.
The function is:

Wal_file( <request_tye>)
Request_type := [ current | last | checkpoint | oldest]

Wal_file ('current') returns the name of the log file currently being
written to.
Wal_file('last') returns the name of the last log file filled.
Wal_file('checkpoint') returns the name of the file containing the current
redo position. The current redo position is the position in the recovery
log where crash recovery would start if the system were to crash now. All
logs prior to this one will not be needed to recover the database cluster
and could be safely removed.
Wal_file('oldest') returns the oldest xlog file found in the pg_xlog
directory.

To actually perform a roll forward you use the postmaster configuration
parameter "roll_forward=yes". This parameter tells the startup process to
perform crash recovery even though the state of the database as found in the
pg_control file indicates a normal shutdown. This is necessary since the
starting point of roll forward session could be the restore of a database
cluster that was shutdown in order to back it up. Furthermore this
parameter tells the startup process not to write out a checkpoint record at
the end of the roll forward session. This allows for the database cluster to
receive subsequent wal files and to have those rolled forward as well. When
starting the postmaster with the roll_forward=yes option, it shuts down the
database as soon as the startup process completes. So the idea is to
restore a backup, copy all of your saved/duplicated wal files into the
pg_xlog directory of the restored database and start the postmaster with the
roll_forward option.

For point in time recovery there is also a roll_forward_until = <time> which
rolls forward through the wal files until the first transaction commit note
that is greater than or equal to the specified time.

The pg_copy utility performs an on line copy of a database cluster. Its
syntax is:
pg_copy <backup_directory> [-h host] [-p port] ...
This makes a copy of the database where backup_directory is what you would
set PGDATA to in order start a postmaster against the backup copy. The
database can be being updated while the copy occurs. If you start a
postmaster against this copy it will appear to the startup process as a
database that crashed at the instant the pg_copy operation completed.
Futhermore the pg_copy utility automatically removes any wal files not
needed to recover the database from either pg_xlog directory or the
wal_file_duplicate directory.

So a DBA to protect the database from media loss just needs to set the
wal_file_duplicate paramater and periodically pg_copy the database.

The BIG THING we have not done is address the issue that add/drop tables and
indexes do not propagate through the roll forward recovery mechanism
properly.

-regards
Richard Tucker

In response to

Re: Issues Outstanding for Point In Time Recovery (PITR) at 2002-07-17 04:24:10 from J. R. Nield

Responses

Re: Issues Outstanding for Point In Time Recovery (PITR) at 2002-07-17 22:17:25 from Bruce Momjian
Re: Issues Outstanding for Point In Time Recovery (PITR) at 2002-07-17 22:43:39 from D'Arcy J.M. Cain

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2002-07-17 22:04:17	Re: error codes
Previous Message	Neil Conway	2002-07-17 20:17:58	Re: error codes