From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-patches(at)postgresql(dot)org |
Subject: | Re: pg_dump additional options for performance |
Date: | 2008-07-27 09:31:48 |
Message-ID: | 1217151108.3894.1218.camel@ebony.2ndQuadrant |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
On Sat, 2008-07-26 at 13:56 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > I want to dump tables separately for performance reasons. There are
> > documented tests showing 100% gains using this method. There is no gain
> > adding this to pg_restore. There is a gain to be had - parallelising
> > index creation, but this patch doesn't provide parallelisation.
>
> Right, but the parallelization is going to happen sometime, and it is
> going to happen in the context of pg_restore.
I honestly think there is less benefit that way than if we consider
things more as a whole:
To do data dump quickly we need to dump different tables to different
disks simultaneously. By its very nature, that cannot end with just a
single file. So the starting point for any restore must be potentially
more than one file.
There are two ways of dumping: either multi-thread pg_dump, or allow
multiple pg_dumps to work together. Second option much less work, same
result. (Either way we also need a way for multiple concurrent sessions
to share a snapshot.)
When restoring, we can then just use multiple pg_restore sessions to
restore the individual data files. Or again we can write a
multi-threaded pg_restore to do the same thing - why would I bother
doing that when I already can? It gains us nothing.
Parallelising the index creation seems best done using concurrent psql.
We've agreed some mods to psql to put multi-sessions in there. If we do
that right, then we can make pg_restore generate a psql script with
multi-session commands scattered appropriately throughout.
Parallel pg_restore is a lot of work for a narrow use case. Concurrent
psql provides a much wider set of use cases.
So fully parallelising dump/restore can be achieved by
* splitting dump into pieces (this patch)
* allowing sessions to share a common snapshot
* concurrent psql
* changes to pg_restore/psql/pg_dump to allow commands to be inserted
which will use concurrent psql features
If we do things this way then we have some useful tools that can be used
in a range of use cases, not just restore.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2008-07-27 09:37:34 | Re: pg_dump additional options for performance |
Previous Message | Tatsuo Ishii | 2008-07-27 08:47:11 | Re: WITH RECUSIVE patches 0723 |
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2008-07-27 09:37:34 | Re: pg_dump additional options for performance |
Previous Message | Tatsuo Ishii | 2008-07-27 08:47:11 | Re: WITH RECUSIVE patches 0723 |