Re: pg_dump additional options for performance

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: pg_dump additional options for performance
Date: 2008-07-26 16:43:55
Message-ID: 20080726164355.GH16005@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > On Fri, 2008-07-25 at 19:16 -0400, Tom Lane wrote:
> >> The key problem is that pg_restore is broken:
>
> > The key capability here is being able to split the dump into multiple
> > pieces. The equivalent capability on restore is *not* required, because
> > once the dump has been split the restore never needs to be.
>
> This argument is nonsense. The typical usage of this capability, IMHO,
> will be
>
> pg_dump -Fc >whole.dump
> pg_restore --schema-before-data whole.dump >before.sql
> pg_restore --data-only whole.dump >data.sql
> pg_restore --schema-after-data whole.dump >after.sql
>
> followed by editing the schema pieces and then loading.

I dislike, and doubt that I'd use, this approach. At the end of the
day, it ends up processing the same (very large amount of data) multiple
times. We have >60G dump files sometimes, and there's no way I'm going
to dump that into a single file first if I can avoid it. What we end up
doing today is --schema-only followed by vi'ing it and splitting it up
by hand, etc, then doing a seperate --data-only dump.

> One reason
> is that this gives you a consistent dump, whereas three successive
> pg_dump runs could never guarantee any such thing.

While this is technically true, in most cases people have control over
the schema bits and would likely be able to ensure that the schema
doesn't change during the time. At that point it's only the data, which
is still done in a transactional way.

> Another reason is that you may well not know when you prepare the
> dump that you will need split output, because the requirement to edit
> the dump is likely to be realized only when you go to load it.

This is a good point. My gut reaction is that, at least in my usage, it
would be more about "if it's larger than a gig, I might as well split it
out, just in case I need to touch something". Honestly, it's rare that
I don't have to make *some* change. Often that's the point of dumping
it out.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-07-26 16:46:48 Re: pg_dump(all) library
Previous Message Tom Lane 2008-07-26 16:24:07 Re: pg_dump additional options for performance

Browse pgsql-patches by date

  From Date Subject
Next Message Simon Riggs 2008-07-26 16:56:42 Re: pg_dump additional options for performance
Previous Message Tom Lane 2008-07-26 16:24:07 Re: pg_dump additional options for performance