Quick Links

Re: Statistics Import and Export

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject:	Re: Statistics Import and Export
Date:	2025-02-21 20:57:20
Message-ID:	x5yo5ly5u3dhzfaq3hrzeqhju3io5zmwk4yqrkge3ywcflffck@ghbbgsfv56yb
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2025-02-21 15:23:00 -0500, Andres Freund wrote:
> On 2025-02-20 01:39:34 -0800, Jeff Davis wrote:
> > Committed with some revisions on top of v48:
>
> This made the pg_upgrade tests considerably slower.
>
> In an assert build without optimization (since that's what I use for normal
> dev work):
>
> 1fd1bd87101^ 65.03s
> 1fd1bd87101 86.84s
>
>
> Looking at the times in the in the regress_log, I see:
> [...]
> Which to me rather strongly suggests pg_dump has gotten a *lot* slower with
> this change.

Indeed. While the slowdown is worse with assertions and without compiler
optimizations, it's pretty bad otherwise too.

optimized, non-cassert, pg_dump and server with the regression database contents:

$ time ./src/bin/pg_dump/pg_dump regression > /dev/null

real 0m1.314s
user 0m0.189s
sys 0m0.059s

$ time ./src/bin/pg_dump/pg_dump --no-statistics regression > /dev/null

real 0m0.472s
user 0m0.179s
sys 0m0.035s

Unoptimized, cassert server and pg_dump:

$ time ./src/bin/pg_dump/pg_dump regression > /dev/null

real 0m9.008s
user 0m0.396s
sys 0m0.108s

$ time ./src/bin/pg_dump/pg_dump --no-statistics regression > /dev/null

real 0m2.590s
user 0m0.347s
sys 0m0.037s

Looking at the query log, the biggest culprit is a *lot* of additional
queries, I think primarily these two:

SELECT c.oid::regclass AS relation, current_setting('server_version_num') AS version, c.relpages, c.reltuples, c.relallvisible FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'alpha_neg_p2'

SELECT c.oid::regclass AS relation, s.attname,s.inherited,current_setting('server_version_num') AS version, s.null_frac,s.avg_width,s.n_distinct,s.most_common_vals,s.most_common_freqs,s.histogram_bounds,s.correlation,s.most_common_elems,s.most_common_elem_freqs,s.elem_count_histogram,s.range_length_histogram,s.range_empty_frac,s.range_bounds_histogram FROM pg_stats s JOIN pg_namespace n ON n.nspname = s.schemaname JOIN pg_class c ON c.relname = s.tablename AND c.relnamespace = n.oid WHERE s.schemaname = 'public' AND s.tablename = 'alpha_neg_p2' ORDER BY s.attname, s.inherited

I think there are a few things wrong here:

1) Why do we need to plan this over and over? Tom a while ago put in a fair
bit of work to make frequent queries use prepared statements.

In this case we spend more time replanning the query than executing it.

2) Querying this one-by-one makes this much more expensive than if it were
queried in a batched fashion, for multiple tables at once. This is
especially true if actually executed over network, rather than locally.

3) The query is unnecessarily expensive due to repeated joins gathering the
same information. pg_stats has a join to pg_namespace and pg_class, but
then the query above joins to both *again*.

And afaict the joins in the pg_stats query are pretty useless? Isn't all
that information already available in pg_stats? I guess you did that to get
it as a ::regclass, but isn't that already known, why requery it?

4) Why do we need to fetch the version twice for every table, that can't be
right? It won't change while pg_dump is running.

Greetings,

Andres Freund

In response to

Re: Statistics Import and Export at 2025-02-21 20:23:00 from Andres Freund

Responses

Re: Statistics Import and Export at 2025-02-21 21:24:38 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2025-02-21 21:11:48	Re: Statistics Import and Export
Previous Message	Mark Dilger	2025-02-21 20:50:21	Re: Amcheck verification of GiST and GIN