From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org, jian he <jian(dot)universality(at)gmail(dot)com> |
Subject: | Re: Statistics Import and Export |
Date: | 2025-02-28 02:32:20 |
Message-ID: | 768b2a237e892bf1334bdcbb066cdc1bd1368cb2.camel@j-davis.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 2025-02-25 at 11:11 +0530, Ashutosh Bapat wrote:
> So the dumped statistics are not restored exactly. The reason for
> this
> is the table statistics is dumped before dumping ALTER TABLE ... ADD
> CONSTRAINT command which changes the statistics. I think all the
> pg_restore_relation_stats() calls should be dumped after all the
> schema and data modifications have been done. OR what's the point in
> dumping statistics only to get rewritten even before restore
> finishes.
In your example, it's not so bad because the stats are actually better:
the index is built after the data is present, and therefore relpages
and reltuples are correct.
The problem is more clear if you use --no-data. If you load data,
ANALYZE, pg_dump --no-data, then reload the sql file, then the stats
are lost.
That workflow is very close to what pg_upgrade does. We solved the
problem for pg_upgrade in commit 71b66171d0 by simply not updating the
statistics when building an index and IsBinaryUpgrade.
To solve the issue with dump --no-data, I propose that we change the
test in 71b66171d0 to only update the stats if the physical relpages is
non-zero.
Patch attached:
* If the dump is --no-data, or during pg_upgrade, the table will be
empty, so the physical relpages will be zero and the restored stats
won't be overwritten.
* If (like in your example) the dump includes data, the new stats are
based on real data, so they are better anyway. This is sort of like the
case where autoanalyze kicks in.
* If the dump is --statistics-only, then there won't be any indexes
created in the SQL file, so when you restore the stats, they will
remain until you do something else to change them.
* If your example really is a problem, you'd need to dump first with -
-no-statistics, and then with --statistics-only, and restore the two
SQL files in order.
Alternatively, we could put stats into SECTION_POST_DATA, which was
already discussed[*], and we decided against it (though there was not a
clear consensus).
Regards,
Jeff Davis
*:
https://www.postgresql.org/message-id/1798867.1712376328%40sss.pgh.pa.us
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Do-not-update-stats-on-empty-table-when-building-.patch | text/x-patch | 4.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-02-28 02:34:14 | Re: per backend WAL statistics |
Previous Message | Thomas Munro | 2025-02-28 02:13:46 | Re: Confine vacuum skip logic to lazy_scan_skip |