Re: Statistics Import and Export

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: Statistics Import and Export
Date: 2024-11-08 18:25:21
Message-ID: CADkLM=fBaHichB7oS=LZNfTpGHSOTO44OKR28BwNmwu+9PicOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> WHAT IS NOT DONE - EXTENDED STATISTICS
>
> It is a general consensus in the community that "nobody uses extended
> statistics", though I've had difficulty getting actual figures to back this
> up, even from my own employer. Surveying several vendors at PgConf.EU, the
> highest estimate was that at most 1% of their customers used extended
> statistics, though more probably should. This reinforces my belief that a
> feature that would eliminate a major pain point in upgrades for 99% of
> customers shouldn't be held back by the fact that the other 1% only have a
> reduced hassle.
>
> However, having relation and attribute statistics carry over on major
> version upgrades presents a slight problem: running vacuumdb
> --analyze-in-stages after such an upgrade is completely unnecessary for
> those without extended statistics, and would actually result in _worse_
> statistics for the database until the last stage is complete. Granted,
> we've had great difficulty getting users to know that vacuumdb is a thing
> that should be run, but word has slowly spread through our own
> documentation and those "This one simple trick will make your postgres go
> fast post-upgrade" blog posts. Those posts will continue to lurk in search
> results long after this feature goes into release, and it would be a rude
> surprise to users to find out that the extra work they put in to learn
> about a feature that helped their upgrade in 17 was suddenly detrimental
> (albeit temporarily) in 18. We should never punish people for only being a
> little-bit current in their knowledge. Moreover, this surprise would
> persist even after we add extended statistics import function functionality.
>
> I presented this problem to several people at PgConf.EU, and the consensus
> least-bad solution was that vacuumdb should filter out tables that are not
> missing any statistics when using options --analyze, --analyze-only, and
> --analyze-in-stages, with an additional flag for now called --force-analyze
> to restore the un-filtered functionality. This gives the outcome tree:
>
> 1. Users who do not have extended statistics and do not use (or not even
> know about) vacuumdb will be blissfully unaware, and will get better
> post-upgrade performance.
> 2. Users who do not have extended statistics but use vacuumdb
> --analyze-in-stages will be pleasantly surprised that the vacuumdb run is
> almost a no-op, and completes quickly. Those who are surprised by this and
> re-run vacuumdb --analyze-in-stages will get another no-op.
> 3. Users who have extended statistics and use vacuumdb --analyze-in-stages
> will get a quicker vacuumdb run, as only the tables with extended stats
> will pass the filter. Subsequent re-runs of vacuumdb --analyze-in-stages
> would be the no-op.
> 4. Users who have extended statistics and don't use vacuumdb will still
> get better performance than they would have without any stats imported.
>
> In case anyone is curious, I'm defining "missing stats" as a table/matview
> with any of the following:
>
> 1. A table with an attribute that lacks a corresponding pg_statistic row.
> 2. A table with an index with an expression attribute that lacks a
> corresponding pg_statistic row (non-expression attributes just borrow the
> pg_statistic row from the table's attribute).
> 3. A table with at least one extended statistic that does not have a
> corresponding pg_statistic_ext_data row.
>
> Note that none of these criteria are concerned with the substance of the
> statistics (ex. pg_statistic row should have mcv stats but does not),
> merely their row-existence.
>
> Some rejected alternative solutions were:
>
> 1. Adding a new option --analyze-missing-stats. While simple, few people
> would learn about it, knowledge of it would be drowned out by the
> aforementioned sea of existing blog posts.
> 2. Adding --analyze-missing-stats and making --analyze-in-stages fail with
> an error message educating the user about --analyze-missing-stats. Users
> might not see the error, existing tooling wouldn't be able to act on the
> error, and there are legitimate non-upgrade uses of --analyze-in-stages.
>
> MAIN CONCERN GOING FORWARD
>
> This change to vacuumdb will require some reworking of the
> vacuum_one_database() function so that the list of tables analyzed is
> preserved across the stages, as subsequent stages runs won't be able to
> detect which tables were previously missing stats.
>
>

Here's a rebased patchset, with the changes to vacuum broken down into baby
steps to make things easier on a reviewer.

0001-0005 same as before, rebased.
0006-0009 structural changes to vacuumdb to make the new feature simpler.
0010 add test issues_sql_unlike, this was needed because we didn't have a
way to determine that a given line _wasn't_ printed.
0011 add --force-analyze to vacuumdb and filter out tables with existing
statistics otherwise.

Attachment Content-Type Size
v31-0004-Enable-in-place-updates-for-pg_restore_relation_.patch text/x-patch 5.4 KB
v31-0002-Remove-schemaOnly-dataOnly-from-dump-restore-opt.patch text/x-patch 10.9 KB
v31-0003-Enable-dumping-of-table-index-stats-in-pg_dump.patch text/x-patch 35.3 KB
v31-0005-Enable-pg_clear_relation_stats-to-handle-differe.patch text/x-patch 4.3 KB
v31-0001-Add-derivative-flags-dumpSchema-dumpData.patch text/x-patch 17.7 KB
v31-0009-preserve-catalog-lists-across-staged-runs.patch text/x-patch 6.7 KB
v31-0006-split-out-check_conn_options.patch text/x-patch 4.5 KB
v31-0008-split-out-generate_catalog_list.patch text/x-patch 7.1 KB
v31-0007-split-out-print_processing_notice.patch text/x-patch 2.4 KB
v31-0010-Add-issues_sql_unlike-opposite-of-issues_sql_lik.patch text/x-patch 1.5 KB
v31-0011-Add-force-analyze-to-vacuumdb.patch text/x-patch 12.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2024-11-08 18:40:13 Re: Avoiding superfluous buffer locking during nbtree backwards scans
Previous Message Masahiko Sawada 2024-11-08 18:25:17 Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4