Quick Links

Re: Can we get rid of repeated queries from pg_dump?

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	depesz(at)depesz(dot)com, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject:	Re: Can we get rid of repeated queries from pg_dump?
Date:	2021-08-30 16:42:39
Message-ID:	1414363.1630341759@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-hackers

I wrote:
> Ah-hah. Those 1729 extension-owned functions account nicely
> for the extra probes into pg_proc, and I bet they are causing
> the unexplained getFormattedTypeName calls too. So the
> *real* problem here seems to be that we're doing too much
> work on objects that are not going to be dumped because they
> are extension members. I'll take a look at that later if
> nobody beats me to it.

I took a quick look at this, and it seems to be mostly the fault
of the DUMP_COMPONENT refactorization that was done awhile ago.
We create DumpableObjects for all these objects, because we need
those to track dependencies. When we arrive at dumpFunc() for
an extension-owned object, it has the DUMP_COMPONENT_SECLABEL
and DUMP_COMPONENT_POLICY flag bits set, whether or not the
function actually has any such properties. This causes dumpFunc
to run through its data collection query, even though nothing
at all is going to get output.

I see that the reason those flags become set is that
checkExtensionMembership does this for an extension member:

dobj->dump = ext->dobj.dump_contains & (DUMP_COMPONENT_ACL |
DUMP_COMPONENT_SECLABEL |
DUMP_COMPONENT_POLICY);

There is logic elsewhere that causes the DUMP_COMPONENT_ACL flag
to get cleared if there's no interesting ACL for the object, but
I see no such logic for SECLABEL or POLICY. That omission is costing
us an awful lot of wasted queries in any database with a lot of
extension-owned objects.

I'm quite allergic to the way that the ACL logic is implemented anyhow,
as there seem to be N copies of essentially identical logic, not to
mention all the inefficient left joins and subqueries that were added
to the fundamental data-gathering queries --- which are only supposed
to find out which objects we want to dump, not expensively collect
scads of detail about every object in the catalogs. I think this is
less in need of a tweak than "burn it to the ground and start over".
I wonder if we can't get to a place where there's only one query that
actually looks into pg_init_privs, more like the way we do it for
descriptions and seclabels (not that the seclabel code is perfect,
as we've found here).

Anyway, it doesn't look like there's much hope of improving this
aspect without a significant rewrite. One band-aidy idea is that
we could check --no-security-labels earlier and not allow that
flag bit to become set in the first place, but that only helps if
the user gives that flag, which few would. (I'm also wondering
more than a little bit why we're allowing DUMP_COMPONENT_POLICY
to become set on objects that aren't tables.)

regards, tom lane

In response to

Re: Can we get rid of repeated queries from pg_dump? at 2021-08-30 14:58:07 from Tom Lane

Responses

Re: Can we get rid of repeated queries from pg_dump? at 2021-10-20 21:46:01 from Tom Lane

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tom Lane	2021-08-30 16:53:25	Re: FW: vacuumlo
Previous Message	Ian Dauncey	2021-08-30 16:23:55	FW: vacuumlo

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fujii Masao	2021-08-30 16:54:36	Re: Possible missing segments in archiving on standby
Previous Message	Aleksander Alekseev	2021-08-30 16:09:14	Re: Patch: shouldn't timezone(text, timestamp[tz]) be STABLE?