Re: LVM snapshots

From: Murthy Kambhampaty <murthy(dot)kambhampaty(at)goeci(dot)com>
To: 'Peter Eisentraut' <peter_e(at)gmx(dot)net>, Matt <matt(at)ymogen(dot)net>
Cc: "'pgsql-admin(at)postgresql(dot)org'" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: LVM snapshots
Date: 2003-04-03 00:21:18
Message-ID: 2D92FEBFD3BE1346A6C397223A8DD3FC0921C6@THOR.goeci.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Here's what I have so far:

The attachment includes a revised version of a script for backing up the
database cluster, and the logs from a couple of instances of it; in these
instances the server was under the following workload:
A script that
1. Copies a roughly 380,000 row and 85 column file into a newly create table
called tbl1
2. Creates 3 single column indexes on tlb1
3. In 3 parallel subshells, 3 separate insert ... from select ... queries
put data from tbl1 into tbl2, tbl3, and tbl4
4. Upon completion, of step3, tbl1 is dropped and the process is repeated
for 12 different text files with distinct data, and similar size
5. 3 single column indexes are added to each of tbl2, tbl3 and tbl4 in 3
parallel subshells

Server hardware:
Quad PIII Xeon 500 MHz/2MB L2, 6 GB RAM, 168 GB HW RAID5 (data), free space
on second 168 GB HW RAID5 for snapshots on a dual channel Mylex Acceleraid
352, with XFS filesystem logs on a LINUX MD1 connected with an Adaptec
39320-R.

Column sums on the larger tables on the production server and on the backup
server gave identical values. Using a SAS System facility called PROC
COMPARE on the smaller tables, they were found to be identical. Note,
however, (i) my rsync options get the WHOLE FILE each time a table changes,
and (ii) because the logs attached are from tables that are being updated on
the production server, the copies of these tables have not been verified for
"accuracy", only recoverability (vacuum analyze completes successfully).
Therefore, users in transaction processing environments will have to do
their own testing. (I have verified that doing incremental rsyncs until the
data stabilize is also reliable; I'm just being paranoid on this one, and
the cost is not too high.)

I have also successfully run the backup simultaneously with the above
workload on a Dual PIII 1GHz/512MB L2, 528 MB RAM, 60 GB IDE data here, 6 GB
IDE (snapshots here) on the SAME ide channel.

This version uses a two-phase rsync: the first rsync copies the live $PGDATA
directory, excluding the pg_xlog/ folder, from the production server to the
backup server; the second is run on an LVM snapshot of the $PGDATA folder.
The postmaster is started and stopped on the backup server, so that any
problems can be identified right away. (Notice the "ReadRecord: unexpected
pageaddr 13C/98EDA000 in log file 317, segment 11, offset 15572992" in the
later log. This seems to be a non-critical error; VACUUM ANALYZE gave a
short series of:
NOTICE: Rel <table name>: Uninitialized page 54300 - fixing
...
VACUUM
).

Others have suggested the two-phase rsync, but there the second phase
required shutting down the postmaster. Besides the obvious advantage, it
turns out that because snapshots are fast -- especially if fsync=true -- the
second rsync has little to do (if you turn of --exclude='pg_xlog/' in the
first phase, you many find that the backup is fully complete and the
snapshot is removed almost immediately after creation). Thus, the time that
the snapshot is up and dragging down filesystem operations is minimized.

ADDITIONAL NOTES:

Data recovery from backup is done with:
pg_dump -Fc -h bkHost -U pguser -d db-to-recover [ -t <tbl-to-recover> ] |
pg_restore -h pgHost -U pguser -c

As, the 7.4dev docs say:
"WAL offers the opportunity for a new method for database on-line backup and
restore (BAR). To use this method, one would have to make periodic saves of
data files to another disk, a tape or another host and also archive the WAL
log files. The database file copy and the archived log files could be used
to restore just as if one were restoring after a crash. Each time a new
database file copy was made the old log files could be removed. Implementing
this facility will require the logging of data file and index creation and
deletion; it will also require development of a method for copying the data
files (operating system copy commands are not suitable)."

If you can stand doing it with filesystem level tools (LVM or EVMS on linux;
hw snapshots on other OS?), you can do this already, as shown here. Besides
BAR, here are a couple of applications to consider:

1. A limited form of PITR: You can recover any object to its state at the
last snapshot backup. (Of course, log roll-forward is missing.)
>From Oracle 9i documentation:
"Introduction to RMAN TSPITR
Recovery Manager (RMAN) automated tablespace point-in-time recovery (TSPITR)
enables you to quickly recover one or more tablespaces to a time that is
different from that of the rest of the database.

RMAN TSPITR is most useful for recovering the following:

* An erroneous DROP TABLE or TRUNCATE TABLE statement
* A table that has become logically corrupted
* An incorrect batch job or other DML statement that has affected only a
subset of the database
...
"
Can do here.

2. It is much easier to upgrade versions. Initdb the new version on pgHost,
then (pg_dumpall -Fc -h bkHost | pg_restore ) from the upgraded server.

3. If $PGDATA/pg_xlog is on a different set of disks than PGDATA, you should
be able to adapt with:
xfs_freeze -f $PGXLOG_FS
xfs_freeze -f $PGDATA_FS
lvcreate -s -L <size> -n snap_pgdata <the $PGDATA_FS device>
lvcreate -s -n <size> -n snap_pgxlog <the $PGXLOG_FS device>
xfs_freeze -u $PGDATA_FS
xfs_freeze -u $PGXLOG_FS
(on the assumption that the postmaster isn't doing anything while waiting
for pg_xlog/ writes to complete). You can then mount the snapshots in the
proper places, rsync and go. This has not been tested, though it's on my
agenda. Could the postgresql developers comment on this?

As Matt points out, this strategy is a useful addition to the postgresql
administrator's tools. It seems useful to update the documentation, and to
point out that the WAL design already gives the benefit of online backup and
restore when combined with externally available snaphsot facilities.

QED. Cheers,
Murthy

-----Original Message-----
From: Peter Eisentraut [mailto:peter_e(at)gmx(dot)net]
Sent: Sunday, March 16, 2003 09:05
To: Matt
Cc: Murthy Kambhampaty; pgsql-admin(at)postgresql(dot)org
Subject: Re: [ADMIN] LVM snapshots

Matt writes:

> If the theory can be robustly demonstrated to work in practice (which
> you may have already done) then I say this strategy needs to be
> recognised in the docs (or at least the techdocs) as a valid and useful
one.

The note in the documentation is rather old. If you can conclusively
disprove it, then we'll be happy to alter it.

--
Peter Eisentraut peter_e(at)gmx(dot)net

Attachment Content-Type Size
pgSnapBack.tgz application/octet-stream 5.7 KB

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message A.Bhuvaneswaran 2003-04-03 05:13:42 Re: Vacuumdb
Previous Message Murthy Kambhampaty 2003-04-02 22:59:16 Re: Speed & Memory Management [2]