| From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
| Subject: | Re: Statistics Import and Export |
| Date: | 2024-04-23 16:33:48 |
| Message-ID: | CAEze2WiX_YWEhe4iLSLHEkSJimE0=wU1cGhmPikurZGOTrh1tg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, 23 Apr 2024, 05:52 Tom Lane, <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> > On Mon, 2024-04-22 at 16:19 -0400, Tom Lane wrote:
> >> Loading data without stats, and hoping
> >> that auto-analyze will catch up sooner not later, is exactly the
> >> current behavior that we're doing all this work to get out of.
>
> > That's the disconnect, I think. For me, the main reason I'm excited
> > about this work is as a way to solve the bad-plans-after-upgrade
> > problem and to repro planner issues outside of production. Avoiding the
> > need to ANALYZE at the end of a data load is also a nice convenience,
> > but not a primary driver (for me).
>
> Oh, I don't doubt that there are use-cases for dumping stats without
> data. I'm just dubious about the reverse. I think data+stats should
> be the default, even if only because pg_dump's default has always
> been to dump everything. Then there should be a way to get stats
> only, and maybe a way to get data only. Maybe this does argue for a
> four-section definition, despite the ensuing churn in the pg_dump API.
I've heard of use cases where dumping stats without data would help
with production database planner debugging on a non-prod system.
Sure, some planner inputs would have to be taken into account too, but
having an exact copy of production stats is at least a start and can
help build models and alerts for what'll happen when the tables grow
larger with the current stats.
As for other planner inputs: table size is relatively easy to shim
with sparse files; cumulative statistics can be copied from a donor
replica if needed, and btree indexes only really really need to
contain their highest and lowest values (and need their height set
correctly).
Kind regards,
Matthias van de Meent
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2024-04-23 16:59:39 | Re: gcc 12.1.0 warning |
| Previous Message | Bertrand Drouvot | 2024-04-23 16:20:46 | Re: Avoid orphaned objects dependencies, take 3 |