Re: Statistics Import and Export

From: Andres Freund <andres(at)anarazel(dot)de>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: Statistics Import and Export
Date: 2025-03-06 14:29:07
Message-ID: pbjbvxnq36xo2ooufcnlfqsqnxuvggbqxbx3myjtswokshfjbl@hfihx57z4j4t
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-03-05 23:04:29 -0500, Corey Huinker wrote:
> > > I'm uncertain how we'd do that with (schemaname,tablename) pairs. Are you
> > > suggesting we back the joins from pg_stats to pg_namespace and pg_class
> > and
> > > then filter by oids?
> >
> > I was thinking of one query per schema or something like that. But yea, a
> > query to pg_namespace and pg_class wouldn't be a problem if we did it far
> > fewer times than before. Or you could put the list of catalogs / tables
> > to
> > be queried into an unnest() with two arrays or such.
> >
> > Not sure how good the query plan for that would be, but it may be worth
> > looking at.
> >
>
> Ok, so we're willing to take the pg_class/pg_namespace join hit for one or
> a handful of queries, good to know.

It's a tradeoff that needs to be evaluated. But I'd be rather surprised if it
weren't faster to run one query with the additional joins than hundreds of
queries without them.

> > > Each call to getAttributeStats() fetches the pg_stats for one and only
> > one
> > > relation and then writes the SQL call to fout, then discards the result
> > set
> > > once all the attributes of the relation are done.
> >
> > I don't think that's true. For one my example demonstrated that it
> > increases
> > the peak memory usage substantially. That'd not be the case if the data was
> > just written out to stdout or such.
> >
> > Looking at the code confirms that. The ArchiveEntry() in
> > dumpRelationStats()
> > is never freed, afaict. And ArchiveEntry() strdups ->createStmt, which
> > contains the "SELECT pg_restore_attribute_stats(...)".
> >
>
> Pardon my inexperience, but aren't the ArchiveEntry records needed right up
> until the program's run?

s/the/the end of the/?

> If there's value in freeing them, why isn't it being done already? What
> other thing would consume this freed memory?

I'm not saying that they can be freed, they can't right now. My point is just
that we *already* keep all the stats in memory, so the fact that fetching all
stats in a single query would also require keeping them in memory is not an
issue.

But TBH, I do wonder how much the current memory usage of the statistics
dump/restore support is going to bite us. In some cases this will dramatically
increase pg_dump/pg_upgrade's memory usage, my tests were with tiny amounts of
data and very simple scalar datatypes and you already could see a substantial
increase. With something like postgis or even just a lot of jsonb columns
this is going to be way worse.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bykov Ivan 2025-03-06 14:32:41 RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET
Previous Message Shinoda, Noriyoshi (SXD Japan FSI) 2025-03-06 14:17:44 RE: jsonb_strip_nulls with arrays?