From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org |
Subject: | Re: Statistics Import and Export |
Date: | 2025-03-06 18:04:55 |
Message-ID: | urk5wcvgvnqzmvdvbe6b2gthhf3pphfqmbwculciy3lm4kvnay@avvgvhxo544p |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-03-06 12:16:44 -0500, Corey Huinker wrote:
> >
> > To be honest, I am a bit surprised that we decided to enable this by
> > default. It's not obvious to me that statistics should be regarded as
> > part of the database in the same way that table definitions or table
> > data are. That said, I'm not overwhelmingly opposed to that choice.
> > However, even if it's the right choice in theory, we should maybe
> > rethink if it's going to be too slow or use too much memory.
> >
>
> I'm strongly in favor of the choice to make it default. This is reducing
> the impact of a post-upgrade customer footgun wherein heavy workloads are
> applied to a database post-upgrade but before analyze/vacuumdb have had a
> chance to do their magic [1].
To be clear, I think this is a very important improvement that most people
should use. I just don't think it's quite there yet.
> It seems to me that we're fretting over seconds when the feature is
> potentially saving the customer hours of reduced availability if not
> outright downtime.
FWIW, I care about the performance for two reasons:
1) It's a difference of seconds in the regression database, which has a few
hundred tables, few columns, very little data and thus small stats. In a
database with a lot of tables and columns with complicated datatypes the
difference will be far larger.
And in contrast to analyzing the database in parallel, the pg_dump/restore
work to restore stats afaict happens single-threaded for each database.
2) The changes initially substantially increased the time a test cycle takes
for me locally. I run the tests 10s to 100s time a day, that really adds
up.
002_pg_upgrade is the test that dominates the overall test time for me, so
it getting slower by a good bit means the overall test time increased.
1fd1bd87101^:
total test time: 1m27.010s
003_pg_upgrade alone: 1m6.309s
1fd1bd87101:
total test time: 1m45.945s
003_pg_upgrade alone: 1m24.597s
master at 0f21db36d66:
total test time: 1m34.576s
003_pg_upgrade alone: 1m12.550s
It clearly got a lot better since 1fd1bd87101, but it's still ~9% slower
than before...
I care about the memory usage effects because I've seen plenty systems where
pg_statistics is many gigabytes (after toast compression!), and I am really
worried that pg_dump having all the serialized strings in memory will cause a
lot of previously working pg_dump invocations and pg_upgrades to fail. That'd
also be a really bad experience.
The more I think about it, the less correct it seems to me to have the
statement to restore statistics tracked via ArchiveOpts->createStmt. We use
that for DDL, but this really is data, not DDL. Because we store it in
->createStmt it's stored in-memory for the runtime of pg_dump, which means the
peak memory usage will inherently be quite high.
I think the stats need to be handled much more like we handle the actual table
data, which are obviously *not* stored in memory for the whole run of pg_dump.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-03-06 18:07:28 | Re: what's going on with lapwing? |
Previous Message | Corey Huinker | 2025-03-06 18:00:07 | Re: Statistics Import and Export |