Re: Statistics Import and Export

From: Andres Freund <andres(at)anarazel(dot)de>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: Statistics Import and Export
Date: 2025-03-06 18:04:55
Message-ID: urk5wcvgvnqzmvdvbe6b2gthhf3pphfqmbwculciy3lm4kvnay@avvgvhxo544p
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-03-06 12:16:44 -0500, Corey Huinker wrote:
> >
> > To be honest, I am a bit surprised that we decided to enable this by
> > default. It's not obvious to me that statistics should be regarded as
> > part of the database in the same way that table definitions or table
> > data are. That said, I'm not overwhelmingly opposed to that choice.
> > However, even if it's the right choice in theory, we should maybe
> > rethink if it's going to be too slow or use too much memory.
> >
>
> I'm strongly in favor of the choice to make it default. This is reducing
> the impact of a post-upgrade customer footgun wherein heavy workloads are
> applied to a database post-upgrade but before analyze/vacuumdb have had a
> chance to do their magic [1].

To be clear, I think this is a very important improvement that most people
should use. I just don't think it's quite there yet.

> It seems to me that we're fretting over seconds when the feature is
> potentially saving the customer hours of reduced availability if not
> outright downtime.

FWIW, I care about the performance for two reasons:

1) It's a difference of seconds in the regression database, which has a few
hundred tables, few columns, very little data and thus small stats. In a
database with a lot of tables and columns with complicated datatypes the
difference will be far larger.

And in contrast to analyzing the database in parallel, the pg_dump/restore
work to restore stats afaict happens single-threaded for each database.

2) The changes initially substantially increased the time a test cycle takes
for me locally. I run the tests 10s to 100s time a day, that really adds
up.

002_pg_upgrade is the test that dominates the overall test time for me, so
it getting slower by a good bit means the overall test time increased.

1fd1bd87101^:
total test time: 1m27.010s
003_pg_upgrade alone: 1m6.309s

1fd1bd87101:
total test time: 1m45.945s
003_pg_upgrade alone: 1m24.597s

master at 0f21db36d66:
total test time: 1m34.576s
003_pg_upgrade alone: 1m12.550s

It clearly got a lot better since 1fd1bd87101, but it's still ~9% slower
than before...

I care about the memory usage effects because I've seen plenty systems where
pg_statistics is many gigabytes (after toast compression!), and I am really
worried that pg_dump having all the serialized strings in memory will cause a
lot of previously working pg_dump invocations and pg_upgrades to fail. That'd
also be a really bad experience.

The more I think about it, the less correct it seems to me to have the
statement to restore statistics tracked via ArchiveOpts->createStmt. We use
that for DDL, but this really is data, not DDL. Because we store it in
->createStmt it's stored in-memory for the runtime of pg_dump, which means the
peak memory usage will inherently be quite high.

I think the stats need to be handled much more like we handle the actual table
data, which are obviously *not* stored in memory for the whole run of pg_dump.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-03-06 18:07:28 Re: what's going on with lapwing?
Previous Message Corey Huinker 2025-03-06 18:00:07 Re: Statistics Import and Export