Re: pg_restore and create FK without verification check

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: Hannu Krosing <hannu(at)tm(dot)ee>, ow <oneway_111(at)yahoo(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_restore and create FK without verification check
Date: 2003-11-26 15:23:47
Message-ID: 14914.1069860227@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andreas Pflug <pgadmin(at)pse-consulting(dot)de> writes:
> This is somewhat complementary to WAL and PITR. I'm seeking for a fast
> way to dump and restore a complete database, like physical file copy,
> without shutting down the backend. I was thinking of a BACKUP command
> that streams out the files including any indexes and non-vacuumed
> tuples. A database recreated from that wouldn't be as clean as a
> pg_dump/pg_restored database, but it would be up much faster, and there
> wouldn't be any dependency problem.

It's already intended to support this as part of the PITR work. The
idea is you force a checkpoint and then make a tar-format dump of the
database tree (tar or whatever floats your boat, but anyway a
filesystem-level backup). The database need not be stopped while you do
this, and you don't need a filesystem that can do snapshots or anything
fancy like that. The tar dump itself most likely does not represent a
consistent state of the database by the time you are done making it.
That is okay, because you have also been archiving off to tape (or
someplace) all the WAL data generated since that pre-dump checkpoint.
You can continue archiving the WAL series for however far forward from
the original dump you feel like. If you need to recover, you reload the
database from the tar dump and then replay the WAL series against it.
This is indistinguishable from a crash recovery situation --- the
"inconsistent" tar dump looks just like a disk that has received some
but not all of the updates since the last checkpoint. Replay will fix it.

The cool thing about this is you can actually bring the DB to the state
it was in at any specific point in time covered by your WAL archive ---
just run the WAL replay as far as you want, then stop. Solves the
"junior DBA deleted all my data Saturday morning" problem, thus "PITR".
Now the uncool thing is you need massive amounts of secondary storage to
archive all that WAL data, if your installation has heavy update
activity. But it seems to me it would address the need you mention
above --- you'd just not bother to continue archiving WAL past the end
of the dump operation.

In principle you could do this today, but we don't have enough
support code in place to make it work smoothly, eg WAL segment files
aren't labeled with enough identifying information to let you manage
an archive full of 'em. Still it doesn't seem that far away.

> This doesn't really replace pg_dump/pg_restore, because it probably
> wouldn't be able to upgrade a cluster.

Right, any such physical dump would be limited to restoring a whole
cluster as-is: no imports into other clusters, no selectivity, no fancy
games. The main reason is you'd have to dump and restore pg_clog along
with the data files.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shridhar Daithankar 2003-11-26 15:23:55 Re: Providing anonymous mmap as an option of sharing memory
Previous Message Zeugswetter Andreas SB SD 2003-11-26 15:12:50 Re: pg_restore and create FK without verification check