Re: Statistics Import and Export

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: Statistics Import and Export
Date: 2025-02-21 21:11:48
Message-ID: qzwh4yv62shsxkkifluvvcgfxb6vxp7zydrhd5v4yk7bl7kl2f@3qg6xjsubtyh
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-02-21 15:49:10 -0500, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > Which to me rather strongly suggests pg_dump has gotten a *lot* slower with
> > this change.
>
> Well, it's doing strictly more work, so somewhat slower is to be
> expected.

Yea, if we had talked a few percent, I'd not have balked. It's more like 2-4x
though and it'll probably be worse when not connecting over local TCP
connections.

This is a slowdown to the point that the downtime for pg_upgrade will be
substantially lengthened compared to before. But I think we should be able to
address that to a large degree.

> In a quick look at the committed patch, it doesn't seem to have
> used any of the speedup strategies we applied to pg_dump a couple
> of years ago. One or the other of these should help:
>
> * Issue a single query to fetch stats from every table we're dumping
> * Set up a prepared query to avoid re-planning the per-table query
> (compare be85727a3)
>
> I'm not sure how workable the first of these would be though.
> It's not hard to imagine it blowing out pg_dump's memory usage
> for a DB with a lot of tables and high default_statistics_target.

We could presumably do the one-query approach for the relation stats, that's
just three integers. That way we'd at least not end up with two queries for
each table (for pg_class.reltuples etc and for pg_stats).

I guess the memory usage could also be addressed by using COPY, but that's
probably unrealistically complicated.

> The second one should be relatively downside-free.

Yea. And at least with pg_dump running locally that's where a lot of the CPU
time is spent.

Remotely doing lots of one-by-one queries will hurt even with prepared
statements though.

One way to largely address that would be to use a prepared statement combined
with libpq pipelining. That still has separate executor startup etc, but I
think it should still reduce the cost to a point where we don't care anymore.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2025-02-21 21:11:54 Re: TAP test started using meson, can get a tcp port already used by another test.
Previous Message Andres Freund 2025-02-21 20:57:20 Re: Statistics Import and Export