Re: pg_dump doesn't dump binary compatible casts

From: Greg Stark <gsstark(at)mit(dot)edu>
To: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump doesn't dump binary compatible casts
Date: 2003-09-24 16:39:53
Message-ID: 87wubyjg2u.fsf@stark.dyndns.tv
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


"Matthew T. O'Connor" <matthew(at)zeut(dot)net> writes:

> On Tue, 2003-09-23 at 22:40, Greg Stark wrote:
> > [But then I'm not a fan of treating pg_dump files as if they were backups.]
>
> If you don't use pg_dump for backups what do you use? Stop the database
> and copy the data directory? That is not a valid choice for most people.

It's currently the only way to do a real backup though. I haven't gone to
production yet with my first postgres project but that's certainly what I'm
planning to recommend to my client.

pg_dump files are not backups. When you've restored a pg_dump you don't
necessarily get back the exact same data you had when you dumped it. Every
piece of data is serialized and then unserialized, space has to be allocated
anew, the DDL has to be reinterpreted. pg_dump has to spend the time to
rebuild tables and indexes too. A straight physical backup doesn't have to do
any of that.

pg_dumps are a logical export, not a physical backup. A lot of work goes into
guaranteeing pg_dump/pg_restore reliably generates equivalent databases, but
even so there are always corner cases as the original post describes. What if
there's a bug and some piece of data doesn't unserialize correctly (arrays
didn't in 7.3 if their lower bound wasn't 1 for example). Or if there's a bug
and the database core dumps trying to rebuild indexes? Or for that matter if
the machine available just wasn't powerful enough to generate the indexes at
all.

I would tend to prefer pg_dump/pg_restore files if they were an option because
the resulting database would be more compact, the indexes more balanced, and
any data corruption would be cleaned up. but I wouldn't feel safe unless I had
a real physical backup ready to use in case the pg_dump didn't work, or took
too long.

Every time I've tried to restore from a pg_dump I've run into small problems
like this. That's fine when I'm doing development, but in a crisis you have to
have an image you can restore and be sure it'll be *exactly* the same as it
was before the failure.

Fwiw, there is a hardware solution to doing 24x7 operation without online
backups. You store the database on a raid mirror (preferably a 3-way mirror so
you're never without mirroring). When the time comes to do the backup you
break the mirror, use the inactive side to do the backup. The backup you get
is exactly what you would have gotten had the machine crashed at the point in
time that you broke the mirror. We used to do something similar even with
Oracle because it was easier and faster than its hot backup system. This
solution requires money for good raid controllers and lots of disks though.

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tim McAuley 2003-09-24 16:40:04 sequence's plpgsql
Previous Message Joe Conway 2003-09-24 16:21:23 Re: creating/accessing new runtime parameters