From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Experimenting with hash tables inside pg_dump |
Date: | 2021-10-22 05:59:39 |
Message-ID: | 20211022055939.z6fihsm7hdzbjttf@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2021-10-21 22:13:22 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > I wonder though if for some of them we should instead replace the per-object
> > queries with one query returning the information for all objects of a type. It
> > doesn't make all that much sense that we build and send one query for each
> > table and index.
>
> The trick is the problem I alluded to in another thread: it's not safe to
> do stuff like pg_get_expr() on tables we don't have lock on.
I was looking at getTableAttrs() - sending one query instead of #tables
queries yields a quite substantial speedup in a quick prototype. And I don't
think it changes anything around locking semantics.
> I've thought about doing something like
>
> SELECT unsafe-functions FROM pg_class WHERE oid IN (someoid, someoid, ...)
>
> but in cases with tens of thousands of tables, it seems unlikely that
> that's going to behave all that nicely.
That's kinda what I'm doing in the quick hack. But instead of using IN(...) I
made it unnest('{oid, oid, ...}'), that scales much better.
A pg_dump --schema-only of the regression database goes from
real 0m0.675s
user 0m0.039s
sys 0m0.029s
to
real 0m0.477s
user 0m0.037s
sys 0m0.020s
which isn't half-bad.
There's a few more cases like this I think. But most are harder because the
dumping happens one-by-one from dumpDumpableObject(). The relatively easy but
substantial cases I could find quickly were getIndexes(), getConstraints(),
getTriggers()
To see where it's worth putting in time it'd be useful if getSchemaData() in
verbose mode printed timing information...
> The *real* fix, I suppose, would be to fix all those catalog-inspection
> functions so that they operate with respect to the query's snapshot.
> But that's not a job I'm volunteering for. Besides which, pg_dump
> still has to cope with back-rev servers where it wouldn't be safe.
Yea, that's not a small change :(. I suspect that we'd need a bunch of new
caching infrastructure to make that reasonably performant, since this
presumably couldn't use syscache etc.
Greetings,
Andres Freund
Attachment | Content-Type | Size |
---|---|---|
pg_dump-bulk-gettableattrs.diff | text/x-diff | 25.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Pyhalov | 2021-10-22 06:26:50 | Re: Partial aggregates pushdown |
Previous Message | Masahiko Sawada | 2021-10-22 05:38:02 | Re: Parallel vacuum workers prevent the oldest xmin from advancing |