Re: Statistics Import and Export

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: Statistics Import and Export
Date: 2025-02-21 22:37:18
Message-ID: blezfpeafycxlizyynwvwzh2vywmklhvfqudhicjrccu4raqpx@4ttfufowc2vo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-02-21 16:24:38 -0500, Tom Lane wrote:
> Oy. Those are outright horrid, even without any consideration of
> pre-preparing them. We know the OID of the table we want to dump,
> we should be doing "FROM pg_class WHERE oid = whatever" and lose
> the join to pg_namespace altogether. The explicit casts to regclass
> are quite expensive too to fetch information that pg_dump already
> has. It already knows the server version, too.

> Moreover, the first of these shouldn't be a separate query at all.
> I objected to fetching pg_statistic content for all tables at once,
> but relpages/reltuples/relallvisible is a pretty small amount of
> new info. We should just collect those fields as part of getTables'
> main query of pg_class (which, indeed, is already fetching relpages).

> On the second one, if we want to go through the pg_stats view then
> we can't rely on table OID, but I don't see why we need the joins
> to anything else. "WHERE s.schemaname = 'x' AND s.tablename = 'y'"
> seems sufficient.

Agreed on all those.

> I wonder whether we ought to issue different queries depending on
> whether we're superuser. The pg_stats view is rather expensive
> because of its security restrictions, and if we're superuser we
> could just look directly at pg_statistic. Maybe those checks are
> fast enough not to matter, but ...

It doesn't seem to make much of a difference, from what I can tell.

At execution time most of the time is is in
a) the joins to pg_attribute and pg_class (the ones in pg_stats)
b) array_out().

The times get way worse if you dump stats for catalog tables, because there
some of arrays are regproc and regprocout calls FuncnameGetCandidates(), which
then ends up iterating over a long cached list... I think that's basically
O(N^2)?

Of course that's nothing we should encounter frequently, but ugh.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-02-21 22:47:58 Re: Statistics Import and Export
Previous Message Greg Sabino Mullane 2025-02-21 22:33:43 PATCH: warn about, and deprecate, clear text passwords