Re: pg_dump and thousands of schemas

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Hugo <hugo(dot)tech(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: pg_dump and thousands of schemas
Date: 2012-05-28 22:26:36
Message-ID: 15138.1338243996@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> There is a quadratic behavior in pg_dump's "mark_create_done". This
> should probably be fixed, but in the mean time it can be circumvented
> by using -Fc rather than -Fp for the dump format. Doing that removed
> 17 minutes from the run time.

Hmm, that would just amount to postponing the work from pg_dump to
pg_restore --- although I suppose it could be a win if the dump is for
backup purposes and you probably won't ever have to restore it.
inhibit_data_for_failed_table() has the same issue, though perhaps it's
less likely to be exercised; and there is a previously noted O(N^2)
behavior for the loop around repoint_table_dependencies.

We could fix these things by setting up index arrays that map dump ID
to TocEntry pointer and dump ID of a table to dump ID of its TABLE DATA
TocEntry. The first of these already exists (tocsByDumpId) but is
currently built only if doing parallel restore. We'd have to build it
all the time to use it for fixing mark_create_done. Still, the extra
space is small compared to the size of the TocEntry data structures,
so I don't see that that's a serious objection.

I have nothing else to do right now so am a bit tempted to go fix this.

> I'm working on a patch to reduce the LockReassignCurrentOwner problem
> in the server when using pg_dump with lots of objects.

Cool.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-05-28 22:37:03 Re: Bogus nestloop rows estimate in 8.4.7
Previous Message Marti Raudsepp 2012-05-28 22:12:46 Re: Bogus nestloop rows estimate in 8.4.7

Browse pgsql-performance by date

  From Date Subject
Next Message Hugo <Nabble> 2012-05-29 05:21:03 Re: pg_dump and thousands of schemas
Previous Message Jeff Janes 2012-05-28 21:24:26 Re: pg_dump and thousands of schemas