From: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org |
Subject: | Re: Statistics Import and Export |
Date: | 2025-03-06 04:04:29 |
Message-ID: | CADkLM=f1n2_Vomq0gKab7xdxDHmJGgn=DE48P8fzQOp3Mrs1Qg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>
>
> > I'm uncertain how we'd do that with (schemaname,tablename) pairs. Are you
> > suggesting we back the joins from pg_stats to pg_namespace and pg_class
> and
> > then filter by oids?
>
> I was thinking of one query per schema or something like that. But yea, a
> query to pg_namespace and pg_class wouldn't be a problem if we did it far
> fewer times than before. Or you could put the list of catalogs / tables
> to
> be queried into an unnest() with two arrays or such.
>
> Not sure how good the query plan for that would be, but it may be worth
> looking at.
>
Ok, so we're willing to take the pg_class/pg_namespace join hit for one or
a handful of queries, good to know.
> > Each call to getAttributeStats() fetches the pg_stats for one and only
> one
> > relation and then writes the SQL call to fout, then discards the result
> set
> > once all the attributes of the relation are done.
>
> I don't think that's true. For one my example demonstrated that it
> increases
> the peak memory usage substantially. That'd not be the case if the data was
> just written out to stdout or such.
>
> Looking at the code confirms that. The ArchiveEntry() in
> dumpRelationStats()
> is never freed, afaict. And ArchiveEntry() strdups ->createStmt, which
> contains the "SELECT pg_restore_attribute_stats(...)".
>
Pardon my inexperience, but aren't the ArchiveEntry records needed right up
until the program's run? If there's value in freeing them, why isn't it
being done already? What other thing would consume this freed memory?
>
>
> > I don't think the query itself would be a problem, a query querying all
> the
> > > required stats should probably use PQsetSingleRowMode() or
> > > PQsetChunkedRowsMode().
> >
> >
> > That makes sense if we get the attribute stats from the result set in the
> > order that we need them, and I don't know how we could possibly do that.
> > We'd still need a table to bsearch() and that would be huge.
>
> I'm not following - what would be the problem with a bsearch()? Compared to
> the stats data an array to map from oid to an index in an array of stats
> data
> data would be very small.
>
If we can do oid bsearch lookups, then we might be in business, but even
then we have to maintain a data structure in memory of all pg_stats records
relevant to this dump, either in PGresult form, an intermediate data
structure like tblinfo/indxinfo, or in the resolved string of
pg_restore_attribute_stats() calls for that relation...which then get
strdup'd into the ArchiveEntry that we have to maintain anyway.
>
>
> But with the unnest() idea from above it wouldn't even be needed, you could
> use
>
> SELECT ...
> FROM unnest(schema_array, table_array) WITH ORDINALITY AS src(schemaname,
> tablename)
> ...
> ORDER BY ordinality
>
> or something along those lines.
>
This still seems like there is some ability to generate a batch of these
rows and then discard them, and then go to the next logical batch (perhaps
by schema, as you suggested earlier), and I don't know that we have that
freedom. Perhaps we would have that freedom if stats were the absolute last
thing loaded in a dump.
Anyway, here's a rebased set of the existing up-for-consideration patches,
plus the optimization of avoiding querying on non-expression indexes.
I should add that this set presently doesn't include a patch that reverts
the set locale and strtof() call in favor of storing reltuples as a string.
As far as I know that idea is still on the table.
Attachment | Content-Type | Size |
---|---|---|
v6-0001-refactor-_tocEntryRequired-for-STATISTICS-DATA.patch | text/x-patch | 2.1 KB |
v6-0002-Organize-and-deduplicate-statistics-import-tests.patch | text/x-patch | 92.7 KB |
v6-0004-Avoid-getAttributeStats-call-for-indexes-without-.patch | text/x-patch | 11.8 KB |
v6-0003-Split-relation-into-schemaname-and-relname.patch | text/x-patch | 62.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-03-06 04:07:54 | Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible |
Previous Message | Daniil Davydov | 2025-03-06 03:59:27 | Add arbitrary xid and mxid to pg_resetwal |