From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Mike Roest <mike(dot)roest(at)replicon(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pg_dump incredibly slow dumping a single schema from a large db |
Date: | 2012-03-31 16:35:53 |
Message-ID: | 21078.1333211753@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Mike Roest <mike(dot)roest(at)replicon(dot)com> writes:
> The file is 6 megs so I've dropped it here.
> That was doing perf for the length of the pg_dump command and then a perf
> report -n
> http://dl.dropbox.com/u/13153/output.txt
Hmm ... that's a remarkably verbose output format, but the useful part
of this info seems to be just
# Events: 2M cpu-clock
#
# Overhead Samples Command Shared Object Symbol
# ........ .......... ............... ............................. .................................................
#
65.96% 1635392 pg_dump pg_dump [.] findLoop
|
--- findLoop
|
|--69.66%-- findDependencyLoops
| sortDumpableObjects
| main
| __libc_start_main
|
|--30.34%-- findLoop
| findDependencyLoops
| sortDumpableObjects
| main
| __libc_start_main
--0.00%-- [...]
10.16% 251955 pg_dump pg_dump [.] getTables
|
--- getTables
getSchemaData
main
__libc_start_main
The file also shows that findObjectByDumpId() accounts for only a
negligible percentage of runtime, which means that the recursion path in
findLoop() is being executed hardly at all. After staring at that for
awhile I realized that what is the O(N^2) part is the initial "is the
object in the workspace?" test. The actual dependency chains are
probably never very long, but as we run through the collection of
objects we gradually add all of them to the front of the workspace.
So this is dumb; we should manage the "is the object already processed"
component of that with an O(1) check, like a bool array or some such,
rather than an O(N) search loop.
As for the getTables slowdown, the only part of that I can see that
looks to be both significant and entirely contained in getTables()
itself is the nested loop near the end that's trying to copy
the dumpable flags for owned sequences from their owning tables.
Do you have a whole lot of owned sequences? Maybe we could postpone
that work until we have the fast table lookup array constructed,
which should reduce this from O(M*N) to O(M*logN).
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Andreas | 2012-03-31 19:42:34 | Problems with Binary Replication |
Previous Message | leaf_yxj | 2012-03-31 14:51:59 | How to check the role has been granted to which role. Help me to double check . Thanks. |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2012-03-31 16:52:37 | Re: [COMMITTERS] pgsql: Add PGDLLIMPORT to ScanKeywords and NumScanKeywords. |
Previous Message | Tom Lane | 2012-03-31 15:59:47 | Re: [COMMITTERS] pgsql: Add PGDLLIMPORT to ScanKeywords and NumScanKeywords. |