Quick Links

Re: Statistics Import and Export

From:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Nathan Bossart <nathandbossart(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject:	Re: Statistics Import and Export
Date:	2025-03-06 04:04:29
Message-ID:	CADkLM=f1n2_Vomq0gKab7xdxDHmJGgn=DE48P8fzQOp3Mrs1Qg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

>
>
> > I'm uncertain how we'd do that with (schemaname,tablename) pairs. Are you
> > suggesting we back the joins from pg_stats to pg_namespace and pg_class
> and
> > then filter by oids?
>
> I was thinking of one query per schema or something like that. But yea, a
> query to pg_namespace and pg_class wouldn't be a problem if we did it far
> fewer times than before. Or you could put the list of catalogs / tables
> to
> be queried into an unnest() with two arrays or such.
>
> Not sure how good the query plan for that would be, but it may be worth
> looking at.
>

Ok, so we're willing to take the pg_class/pg_namespace join hit for one or
a handful of queries, good to know.

> > Each call to getAttributeStats() fetches the pg_stats for one and only
> one
> > relation and then writes the SQL call to fout, then discards the result
> set
> > once all the attributes of the relation are done.
>
> I don't think that's true. For one my example demonstrated that it
> increases
> the peak memory usage substantially. That'd not be the case if the data was
> just written out to stdout or such.
>
> Looking at the code confirms that. The ArchiveEntry() in
> dumpRelationStats()
> is never freed, afaict. And ArchiveEntry() strdups ->createStmt, which
> contains the "SELECT pg_restore_attribute_stats(...)".
>

Pardon my inexperience, but aren't the ArchiveEntry records needed right up
until the program's run? If there's value in freeing them, why isn't it
being done already? What other thing would consume this freed memory?

>
>
> > I don't think the query itself would be a problem, a query querying all
> the
> > > required stats should probably use PQsetSingleRowMode() or
> > > PQsetChunkedRowsMode().
> >
> >
> > That makes sense if we get the attribute stats from the result set in the
> > order that we need them, and I don't know how we could possibly do that.
> > We'd still need a table to bsearch() and that would be huge.
>
> I'm not following - what would be the problem with a bsearch()? Compared to
> the stats data an array to map from oid to an index in an array of stats
> data
> data would be very small.
>

If we can do oid bsearch lookups, then we might be in business, but even
then we have to maintain a data structure in memory of all pg_stats records
relevant to this dump, either in PGresult form, an intermediate data
structure like tblinfo/indxinfo, or in the resolved string of
pg_restore_attribute_stats() calls for that relation...which then get
strdup'd into the ArchiveEntry that we have to maintain anyway.

>
>
> But with the unnest() idea from above it wouldn't even be needed, you could
> use
>
> SELECT ...
> FROM unnest(schema_array, table_array) WITH ORDINALITY AS src(schemaname,
> tablename)
> ...
> ORDER BY ordinality
>
> or something along those lines.
>

This still seems like there is some ability to generate a batch of these
rows and then discard them, and then go to the next logical batch (perhaps
by schema, as you suggested earlier), and I don't know that we have that
freedom. Perhaps we would have that freedom if stats were the absolute last
thing loaded in a dump.

Anyway, here's a rebased set of the existing up-for-consideration patches,
plus the optimization of avoiding querying on non-expression indexes.

I should add that this set presently doesn't include a patch that reverts
the set locale and strtof() call in favor of storing reltuples as a string.
As far as I know that idea is still on the table.

Attachment	Content-Type	Size
v6-0001-refactor-_tocEntryRequired-for-STATISTICS-DATA.patch	text/x-patch	2.1 KB
v6-0002-Organize-and-deduplicate-statistics-import-tests.patch	text/x-patch	92.7 KB
v6-0004-Avoid-getAttributeStats-call-for-indexes-without-.patch	text/x-patch	11.8 KB
v6-0003-Split-relation-into-schemaname-and-relname.patch	text/x-patch	62.3 KB

In response to

Re: Statistics Import and Export at 2025-03-06 03:41:24 from Andres Freund

Responses

Re: Statistics Import and Export at 2025-03-06 08:07:01 from Jeff Davis
Re: Statistics Import and Export at 2025-03-06 08:48:40 from Jeff Davis
Re: Statistics Import and Export at 2025-03-06 14:29:07 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2025-03-06 04:07:54	Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
Previous Message	Daniil Davydov	2025-03-06 03:59:27	Add arbitrary xid and mxid to pg_resetwal