From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | Craig James <cjames(at)emolecules(dot)com>, Hugo <hugo(dot)tech(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: pg_dump and thousands of schemas |
Date: | 2012-05-25 15:40:04 |
Message-ID: | CAMkU=1yjA-xQXnFTQVWa5jq-JPXyRdm5RXQQtADDm4T7vKhHoA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
On Thu, May 24, 2012 at 8:54 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Thu, May 24, 2012 at 08:20:34PM -0700, Jeff Janes wrote:
>> pg_dump itself seems to have a lot of quadratic portions (plus another
>> one on the server which it hits pretty heavily), and it hard to know
>> where to start addressing them. It seems like addressing the overall
>> quadratic nature might be a globally better option, but addressing
>> just the problem with dumping one schema might be easier to kluge
>> together.
>
> Postgres 9.2 will have some speedups for pg_dump scanning large
> databases --- that might help.
Those speed ups don't seem to apply here, though. I get the same
performance in 9.0.7 as 9.2.beta1.
There is an operation in pg_dump which is O(#_of_schemata_in_db *
#_of_table_in_db), or something like that.
The attached very crude patch reduces that to
O(log_of_#_of_schemata_in_db * #_of_table_in_db)
I was hoping this would be a general improvement. It doesn't seem be.
But it is a very substantial improvement in the specific case of
dumping one small schema out of a very large database.
It seems like dumping one schema would be better optimized by not
loading up the entire database catalog, but rather by restricting to
just that schema at the catalog stage. But I haven't dug into those
details.
For dumping entire databases, It looks like the biggest problem is
going to be LockReassignCurrentOwner in the server. And that doesn't
seem to be easy to fix, as any change to it to improve pg_dump will
risk degrading normal use cases.
If we want to be able to efficiently dump entire databases in a
scalable way, it seems like there should be some way to obtain a
data-base-wide AccessShare lock, which blocks AccessExclusive locks on
any object in the database, and turns ordinary object-level
AccessShare lock requests into no-ops. I don't think you can get
hundreds of thousands of simultaneously held and individually recorded
AccessShare locks without causing bad things to happen.
Cheers,
Jeff
Attachment | Content-Type | Size |
---|---|---|
pg_dump_ns_search.patch | application/octet-stream | 2.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2012-05-25 15:53:45 | Re: pg_dump and thousands of schemas |
Previous Message | Tom Lane | 2012-05-25 15:22:49 | Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile |
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Spiegelberg | 2012-05-25 15:52:22 | Millions of relations (from Maximum number of sequences that can be created) |
Previous Message | Bruce Momjian | 2012-05-25 15:18:30 | Re: pg_dump and thousands of schemas |