Re: Statistics Import and Export

From: "Wetmore, Matthew (CTR)" <Matthew(dot)Wetmore(at)mdlive(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, "Magnus Hagander" <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, "Ashutosh Bapat" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "alvherre(at)alvh(dot)no-ip(dot)org" <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Statistics Import and Export
Date: 2024-11-18 20:06:24
Message-ID: B7E10138-9F0E-4609-B993-93DFDC3D1CE2@glbcore.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry to chime in with a dumb question:

How would/could this effect tables that have the vacuum and analyze scale_factors different from the rest of db via the ALTE RTABLE statement?

(I do this a lot)

ALTER TABLE your_schema.your_table SET (autovacuum_enabled,autovacuum_analyze_scale_factor,autovacuum_vacuum_scale_factor);

Just wanted to mention

Thanks Bruce,

/matt

--

[MDLIVE]<https://www.mdlive.com/>
Matt Wetmore
Data Engineer
m. +1-415-416-9738

From: Bruce Momjian <bruce(at)momjian(dot)us>
Date: Monday, November 18, 2024 at 11:48 AM
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "alvherre(at)alvh(dot)no-ip(dot)org" <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Statistics Import and Export

On Fri, Nov 8, 2024 at 01: 25: 21PM -0500, Corey Huinker wrote: > WHAT IS NOT DONE - EXTENDED STATISTICS > > It is a general consensus in the community that "nobody uses extended > statistics", though I've had difficulty getting actual

On Fri, Nov 8, 2024 at 01:25:21PM -0500, Corey Huinker wrote:

> WHAT IS NOT DONE - EXTENDED STATISTICS

>

> It is a general consensus in the community that "nobody uses extended

> statistics", though I've had difficulty getting actual figures to back this

> up, even from my own employer. Surveying several vendors at PgConf.EU, the

> highest estimate was that at most 1% of their customers used extended

> statistics, though more probably should. This reinforces my belief that a

> feature that would eliminate a major pain point in upgrades for 99% of

> customers shouldn't be held back by the fact that the other 1% only have a

> reduced hassle.

>

> However, having relation and attribute statistics carry over on major

> version upgrades presents a slight problem: running vacuumdb

> --analyze-in-stages after such an upgrade is completely unnecessary for

> those without extended statistics, and would actually result in _worse_

> statistics for the database until the last stage is complete. Granted,

> we've had great difficulty getting users to know that vacuumdb is a thing

> that should be run, but word has slowly spread through our own

> documentation and those "This one simple trick will make your postgres go

> fast post-upgrade" blog posts. Those posts will continue to lurk in search

> results long after this feature goes into release, and it would be a rude

> surprise to users to find out that the extra work they put in to learn

> about a feature that helped their upgrade in 17 was suddenly detrimental

> (albeit temporarily) in 18. We should never punish people for only being a

> little-bit current in their knowledge. Moreover, this surprise would

> persist even after we add extended statistics import function

> functionality.

>

> I presented this problem to several people at PgConf.EU, and the consensus

> least-bad solution was that vacuumdb should filter out tables that are not

> missing any statistics when using options --analyze, --analyze-only, and

> --analyze-in-stages, with an additional flag for now called --force-analyze

> to restore the un-filtered functionality. This gives the outcome tree:

>

> 1. Users who do not have extended statistics and do not use (or not even

> know about) vacuumdb will be blissfully unaware, and will get better

> post-upgrade performance.

> 2. Users who do not have extended statistics but use vacuumdb

> --analyze-in-stages will be pleasantly surprised that the vacuumdb run is

> almost a no-op, and completes quickly. Those who are surprised by this and

> re-run vacuumdb --analyze-in-stages will get another no-op.

> 3. Users who have extended statistics and use vacuumdb --analyze-in-stages

> will get a quicker vacuumdb run, as only the tables with extended stats

> will pass the filter. Subsequent re-runs of vacuumdb --analyze-in-stages

> would be the no-op.

> 4. Users who have extended statistics and don't use vacuumdb will still get

> better performance than they would have without any stats imported.

>

> In case anyone is curious, I'm defining "missing stats" as a table/matview

> with any of the following:

>

> 1. A table with an attribute that lacks a corresponding pg_statistic row.

> 2. A table with an index with an expression attribute that lacks a

> corresponding pg_statistic row (non-expression attributes just borrow the

> pg_statistic row from the table's attribute).

> 3. A table with at least one extended statistic that does not have a

> corresponding pg_statistic_ext_data row.

>

> Note that none of these criteria are concerned with the substance of the

> statistics (ex. pg_statistic row should have mcv stats but does not),

> merely their row-existence.

>

> Some rejected alternative solutions were:

>

> 1. Adding a new option --analyze-missing-stats. While simple, few people

> would learn about it, knowledge of it would be drowned out by the

> aforementioned sea of existing blog posts.

> 2. Adding --analyze-missing-stats and making --analyze-in-stages fail with

> an error message educating the user about --analyze-missing-stats. Users

> might not see the error, existing tooling wouldn't be able to act on the

> error, and there are legitimate non-upgrade uses of --analyze-in-stages.

>

> MAIN CONCERN GOING FORWARD

>

> This change to vacuumdb will require some reworking of the

> vacuum_one_database() function so that the list of tables analyzed is

> preserved across the stages, as subsequent stages runs won't be able to

> detect which tables were previously missing stats.

You seem to be optimizing for people using pg_upgrade, and for people

upgrading to PG 18, without adequately considering people using vacuumdb

in non-pg_upgrade situations, and people using PG 19+. Let me explain.

First, I see little concern here for how people who use --analyze and

--analyze-only independent of pg_upgrade will be affected by this.

While I recommend people decrease vacuum and analyze threshold during

non-peak periods:

https://urldefense.com/v3/__https://momjian.us/main/blogs/pgblog/2017.html*January_3_2017__;Iw!!JtyhRs6NQr98nj80!5p5jBUgKYWVIwaCN-IBjzU3lWjh76cjhyOBvdbX0A7yWbineV7Ax1atOztVc1CUn_MTpqVGWkyGKDrCfXFTNfw$<https://urldefense.com/v3/__https:/momjian.us/main/blogs/pgblog/2017.html*January_3_2017__;Iw!!JtyhRs6NQr98nj80!5p5jBUgKYWVIwaCN-IBjzU3lWjh76cjhyOBvdbX0A7yWbineV7Ax1atOztVc1CUn_MTpqVGWkyGKDrCfXFTNfw$>

some people might just regenerate all statistics during non-peak periods

using these options. You can perhaps argue that --analyze-in-stages

would only be used by pg_upgrade so maybe that can be adjusted more

easily.

Second, the API for what --analyze and --analyze-only do will be very

confusing for people running, e.g., PG 20, because the average user

reading the option name will not guess it only adds missing statistics.

I think you need to rethink your approach and just accept that a mention

of the new preserving statistic behavior of pg_upgrade, and the new

vacuumdb API required, will be sufficient. In summary, I think you need

a new --compute-missing-statistics-only that can be combined with

--analyze, --analyze-only, and --analyze-in-stages to compute only

missing statistics, and document it in the PG 18 release notes.

Frankly, we have a similar problem with partitioned tables:

https://urldefense.com/v3/__https://www.postgresql.org/docs/current/sql-analyze.html__;!!JtyhRs6NQr98nj80!5p5jBUgKYWVIwaCN-IBjzU3lWjh76cjhyOBvdbX0A7yWbineV7Ax1atOztVc1CUn_MTpqVGWkyGKDrAcfiLyFQ$<https://urldefense.com/v3/__https:/www.postgresql.org/docs/current/sql-analyze.html__;!!JtyhRs6NQr98nj80!5p5jBUgKYWVIwaCN-IBjzU3lWjh76cjhyOBvdbX0A7yWbineV7Ax1atOztVc1CUn_MTpqVGWkyGKDrAcfiLyFQ$>

For partitioned tables, ANALYZE gathers statistics by sampling

rows from all partitions; in addition, it will recurse into

each partition and update its statistics. Each leaf partition

is analyzed only once, even with multi-level partitioning. No

statistics are collected for only the parent table (without data

from its partitions), because with partitioning it's guaranteed

to be empty.

--> The autovacuum daemon does not process partitioned tables, nor

does it process inheritance parents if only the children are ever

modified. It is usually necessary to periodically run a manual

ANALYZE to keep the statistics of the table hierarchy up to date.

Now, you can say partitioned table statistics are not as important as

extended statistics, but that fact remains that we have these two odd

cases where special work must be done to generate statistics.

--

Bruce Momjian <bruce(at)momjian(dot)us> https://urldefense.com/v3/__https://momjian.us__;!!JtyhRs6NQr98nj80!5p5jBUgKYWVIwaCN-IBjzU3lWjh76cjhyOBvdbX0A7yWbineV7Ax1atOztVc1CUn_MTpqVGWkyGKDrCf9HVZng$<https://urldefense.com/v3/__https:/momjian.us__;!!JtyhRs6NQr98nj80!5p5jBUgKYWVIwaCN-IBjzU3lWjh76cjhyOBvdbX0A7yWbineV7Ax1atOztVc1CUn_MTpqVGWkyGKDrCf9HVZng$>

EDB https://urldefense.com/v3/__https://enterprisedb.com__;!!JtyhRs6NQr98nj80!5p5jBUgKYWVIwaCN-IBjzU3lWjh76cjhyOBvdbX0A7yWbineV7Ax1atOztVc1CUn_MTpqVGWkyGKDrBvicktqQ$<https://urldefense.com/v3/__https:/enterprisedb.com__;!!JtyhRs6NQr98nj80!5p5jBUgKYWVIwaCN-IBjzU3lWjh76cjhyOBvdbX0A7yWbineV7Ax1atOztVc1CUn_MTpqVGWkyGKDrBvicktqQ$>

When a patient asks the doctor, "Am I going to die?", he means

"Am I going to die soon?"

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Torsten Förtsch 2024-11-18 20:21:56 PGSERVICEFILE as part of a normal connection string
Previous Message Bruce Momjian 2024-11-18 19:59:18 Re: Making error message more user-friendly with spaces in a URI