Re: Statistics Import and Export

From: Andres Freund <andres(at)anarazel(dot)de>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: Statistics Import and Export
Date: 2025-03-06 01:17:53
Message-ID: rp6int5sqs5bn7g35brqz7w3t7il6tn2dgpm7tr3pev626wnc4@fi7a3enofhct
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-02-25 21:29:56 -0500, Corey Huinker wrote:
> On Tue, Feb 25, 2025 at 9:00 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> > On Mon, 2025-02-24 at 09:54 -0500, Andres Freund wrote:
> > > Have you compared performance of with/without stats after these
> > > optimizations?
> >
> > On unoptimized build with asserts enabled, dumping the regression
> > database:
> >
> > --no-statistics: 1.0s
> > master: 3.6s
> > v3j-0001: 3.0s
> > v3j-0002: 1.7s
> >
> > I plan to commit the patches soon.

I think these have all been committed, but I still see a larger performance
difference than what you observed. I just checked because I was noticing that
the tests are still considerably slower than they used to be.

An optimized pg_dump against an unoptimized assert-enabled server:

time ./src/bin/pg_dump/pg_dump --no-data --quote-all-identifiers --binary-upgrade --format=custom --no-sync regression > /dev/null
real 0m2.778s
user 0m0.167s
sys 0m0.057s

$ time ./src/bin/pg_dump/pg_dump --no-data --quote-all-identifiers --binary-upgrade --format=custom --no-sync --no-statistics regression > /dev/null

real 0m1.290s
user 0m0.097s
sys 0m0.026s

I thought it might be interesting to look at the set of queries arriving on
the server side, so I enabled pg-stat_statements and ran a dump:

regression[4041753][1]=# SELECT total_exec_time, total_plan_time, calls, plans, substring(query, 1, 30) FROM pg_stat_statements ORDER BY calls DESC LIMIT 15;
┌─────────────────────┬─────────────────────┬───────┬───────┬────────────────────────────────┐
│ total_exec_time │ total_plan_time │ calls │ plans │ substring │
├─────────────────────┼─────────────────────┼───────┼───────┼────────────────────────────────┤
│ 239.9672189999998 │ 12.5725 │ 981 │ 6 │ PREPARE getAttributeStats(pg_c │
│ 15.330405000000004 │ 1.836712 │ 282 │ 6 │ PREPARE dumpFunc(pg_catalog.oi │
│ 10.129114000000003 │ 0.39834800000000004 │ 199 │ 6 │ PREPARE dumpTableAttach(pg_cat │
│ 9.887489000000002 │ 0.9332620000000001 │ 84 │ 84 │ SELECT pg_get_partkeydef($1) │
│ 14.350725000000006 │ 0.691071 │ 60 │ 60 │ SELECT pg_catalog.pg_get_viewd │
│ 5.1174219999999995 │ 1.4604219999999999 │ 47 │ 6 │ PREPARE dumpAgg(pg_catalog.oid │
│ 0.24036199999999996 │ 0.545125 │ 41 │ 41 │ SELECT pg_catalog.format_type( │
│ 7.099635000000002 │ 0.47031800000000007 │ 39 │ 39 │ SELECT pg_catalog.pg_get_ruled │
│ 0.672752 │ 1.9036320000000002 │ 21 │ 6 │ PREPARE dumpDomain(pg_catalog. │
│ 1.6519299999999997 │ 3.1480380000000006 │ 21 │ 22 │ PREPARE getDomainConstraints(p │
│ 1.085548 │ 3.9647630000000005 │ 16 │ 6 │ PREPARE dumpCompositeType(pg_c │
│ 0.196259 │ 0.602291 │ 11 │ 6 │ PREPARE dumpOpr(pg_catalog.oid │
│ 0.265461 │ 4.428914 │ 10 │ 10 │ SELECT amprocnum, amproc::pg_c │
│ 0.39591399999999993 │ 9.345471 │ 10 │ 10 │ SELECT amopstrategy, amopopr:: │
│ 0.35752100000000003 │ 2.128437 │ 9 │ 9 │ SELECT nspname, tmplname FROM │
└─────────────────────┴─────────────────────┴───────┴───────┴────────────────────────────────┘

It looks a lot less bad with an optimized build:
regression[4042057][1]=# SELECT total_exec_time, total_plan_time, calls, plans, substring(query, 1, 30) FROM pg_stat_statements ORDER BY calls DESC LIMIT 15;
┌─────────────────────┬─────────────────────┬───────┬───────┬────────────────────────────────┐
│ total_exec_time │ total_plan_time │ calls │ plans │ substring │
├─────────────────────┼─────────────────────┼───────┼───────┼────────────────────────────────┤
│ 50.63764299999999 │ 2.503585 │ 981 │ 6 │ PREPARE getAttributeStats(pg_c │
│ 3.5241990000000007 │ 0.478541 │ 282 │ 6 │ PREPARE dumpFunc(pg_catalog.oi │
│ 2.3170359999999985 │ 0.126379 │ 199 │ 6 │ PREPARE dumpTableAttach(pg_cat │
│ 2.291331 │ 0.25360400000000005 │ 84 │ 84 │ SELECT pg_get_partkeydef($1) │
│ 4.678433000000003 │ 0.202578 │ 60 │ 60 │ SELECT pg_catalog.pg_get_viewd │
│ 1.1288440000000004 │ 0.30976200000000004 │ 47 │ 6 │ PREPARE dumpAgg(pg_catalog.oid │
│ 0.06619 │ 0.16813600000000004 │ 41 │ 41 │ SELECT pg_catalog.format_type( │
│ 2.102865 │ 0.115169 │ 39 │ 39 │ SELECT pg_catalog.pg_get_ruled │
│ 0.16163 │ 0.439991 │ 21 │ 6 │ PREPARE dumpDomain(pg_catalog. │
│ 0.5335120000000001 │ 0.727573 │ 21 │ 22 │ PREPARE getDomainConstraints(p │
│ 0.28177 │ 0.894156 │ 16 │ 6 │ PREPARE dumpCompositeType(pg_c │
│ 0.038558 │ 0.140807 │ 11 │ 6 │ PREPARE dumpOpr(pg_catalog.oid │
│ 0.082078 │ 0.9654280000000001 │ 10 │ 10 │ SELECT amprocnum, amproc::pg_c │
│ 0.136964 │ 2.1140120000000002 │ 10 │ 10 │ SELECT amopstrategy, amopopr:: │
│ 0.11634699999999999 │ 0.48550499999999996 │ 9 │ 9 │ SELECT nspname, tmplname FROM │
└─────────────────────┴─────────────────────┴───────┴───────┴────────────────────────────────┘
(15 rows)

This isn't even *remotely* an adversarial case, there are lots of workloads
with folks have a handful of indexes on each table and many many tables.

Right now --statistics more than doubles the number of queries that pg_dump
issues. That's oviously noticeable locally, but it's going to be really
noticeable when dumping across the network.

I think we need to do more to lessen the impact. Even leaving regression test
performance aside, the time increase for the default pg_dump invocation will
be painful for folks, particularly due to this being enabled by default.

One fairly easy win would be to stop issuing getAttributeStats() for
non-expression indexes. In most cases that'll already drastically cut down on
the extra queries.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2025-03-06 01:30:48 Re: Statistics Import and Export
Previous Message Sami Imseih 2025-03-06 01:17:16 Re: track generic and custom plans in pg_stat_statements