From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org |
Subject: | Re: Statistics Import and Export |
Date: | 2025-03-06 03:41:24 |
Message-ID: | xe3yjkcthy7f377zjayuckxyd62z5cekgjau3j4vp6elbicson@cx2uiilwmxyy |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-03-05 22:00:42 -0500, Corey Huinker wrote:
> On Wed, Mar 5, 2025 at 9:18 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2025-03-05 20:54:35 -0500, Corey Huinker wrote:
> > > It's been considered and not ruled out, with a "let's see how the simple
> > > thing works, first" approach. Considerations are:
> > >
> > > * pg_stats is keyed on schemaname + tablename (which can also be indexes)
> > > and we need to use that because of the security barrier
> >
> > I don't think that has to be a big issue, you can just make the the query
> > query multiple tables at once using an = ANY(ARRAY[]) expression or such.
> >
>
> I'm uncertain how we'd do that with (schemaname,tablename) pairs. Are you
> suggesting we back the joins from pg_stats to pg_namespace and pg_class and
> then filter by oids?
I was thinking of one query per schema or something like that. But yea, a
query to pg_namespace and pg_class wouldn't be a problem if we did it far
fewer times than before. Or you could put the list of catalogs / tables to
be queried into an unnest() with two arrays or such.
Not sure how good the query plan for that would be, but it may be worth
looking at.
> > > * The stats data is kinda heavy (most common value lists, most common
> > > elements lists, esp for high stattargets), which would be a considerable
> > > memory impact and some of those stats might not even be needed (example,
> > > index stats for a table that is filtered out)
> >
> > Doesn't the code currently have this problem already? Afaict the stats are
> > currently all stored in memory inside pg_dump.
> >
>
> Each call to getAttributeStats() fetches the pg_stats for one and only one
> relation and then writes the SQL call to fout, then discards the result set
> once all the attributes of the relation are done.
I don't think that's true. For one my example demonstrated that it increases
the peak memory usage substantially. That'd not be the case if the data was
just written out to stdout or such.
Looking at the code confirms that. The ArchiveEntry() in dumpRelationStats()
is never freed, afaict. And ArchiveEntry() strdups ->createStmt, which
contains the "SELECT pg_restore_attribute_stats(...)".
> I don't think the query itself would be a problem, a query querying all the
> > required stats should probably use PQsetSingleRowMode() or
> > PQsetChunkedRowsMode().
>
>
> That makes sense if we get the attribute stats from the result set in the
> order that we need them, and I don't know how we could possibly do that.
> We'd still need a table to bsearch() and that would be huge.
I'm not following - what would be the problem with a bsearch()? Compared to
the stats data an array to map from oid to an index in an array of stats data
data would be very small.
But with the unnest() idea from above it wouldn't even be needed, you could
use
SELECT ...
FROM unnest(schema_array, table_array) WITH ORDINALITY AS src(schemaname, tablename)
...
ORDER BY ordinality
or something along those lines.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Ajin Cherian | 2025-03-06 03:47:52 | Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided. |
Previous Message | jian he | 2025-03-06 03:04:50 | Re: support fast default for domain with constraints |