backup/restore - another area.

From: Jeff <threshar(at)torgo(dot)978(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: backup/restore - another area.
Date: 2003-10-09 19:34:03
Message-ID: Pine.BSF.4.44.0310091517320.71723-100000@torgo.978.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-performance

Boy, I must be getting annoying by now huh?

Anyway, after the joys of Solaris being fast I'm moving onto another area
- backup & restore. I've been checking the archives and haven't seen any
"good" tips for backing up big databases (and more importantly,
restoring).

I've noticed while doing a backup (with both -Fc and regular recipe) that
my IO is no where near being stressed. According to vmstat, it sits
around reading about 512kB/sec (with occasional spikes) and every 5-6
seconds it writes out a 3MB hunk.

So as a test I decided to cp a 1GB file and got a constant read speed of
20MB/sec and the writes. well. were more sporatic (buffering most likely)
and it would write out 60MB every 3 seconds.

And. then.. on the restore I notice similar things - IO hardly being
stressed at all... reading in at ~512kB/sec and every now and then writing
out a few MB.

So, I've been thinking of various backup/restore strategies... some I'm
sure some people do, some need code written and may be controvertial..

Idea #1:
Use an LVM and take a snapshop - archive that.
>From the way I see it. the downside is the LVM will use a lot of space
until the snapshot is removed. Also PG may be in a slightly inconsistant
state - but this should "appear" to PG the same as if the power went out.

For restore, simply unarchive this snapshot and point postgres at it. Let
it recover and you are good to go.

Little overhead from what I see...
I'm leaning towards this method the more I think of it.

Idea #2:

a new program/internal "system". Lets call it pg_backup. It would generate
a very fast backup (that restores very fast) at the expense of disk space.
Pretty much what we would do is write out new copies of all the pages in
the db - both indexes and tables.

the pro's to this is it does not depend on an LVM and therefore is
accessable to all platforms. it also has the other benfets mentioned
above, except speed.

For a restore PG would need something like a 'restore mode' where we can
just have it pump pages into it somehow.. It would not have to build
index, check constraints, and all that because by definition the backup
would contain valid data.

The downside for both of these are that the backup is only good for that
version of PG on that architecture. Speaking in Informix world this is
how it is - it has a fast backup & fast restore that does essentially #2
and then it has export/import options (works like our current pg_dump and
restore).

and oh yeah -I've tried disabling fsync on load and while it did go faster
it was only 2 minutes faster (9m vs 11m).

Any thoughts on this? What do you ther folk with big db's do?

--
Jeff Trout <jeff(at)jefftrout(dot)com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Andrew Sullivan 2003-10-09 20:32:29 Re: What is pg_attribute_relid_attnam_index?
Previous Message Scott 2003-10-09 13:44:28 Oracle takes the wraps off SME database

Browse pgsql-performance by date

  From Date Subject
Next Message Jason Hihn 2003-10-09 19:56:33 Re: PostgreSQL vs MySQL
Previous Message Joshua D. Drake 2003-10-09 19:30:08 Re: [PERFORM] OFFTOPIC: PostgreSQL vs MySQL