From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org |
Subject: | Re: Statistics Import and Export |
Date: | 2025-02-21 21:11:48 |
Message-ID: | qzwh4yv62shsxkkifluvvcgfxb6vxp7zydrhd5v4yk7bl7kl2f@3qg6xjsubtyh |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-02-21 15:49:10 -0500, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > Which to me rather strongly suggests pg_dump has gotten a *lot* slower with
> > this change.
>
> Well, it's doing strictly more work, so somewhat slower is to be
> expected.
Yea, if we had talked a few percent, I'd not have balked. It's more like 2-4x
though and it'll probably be worse when not connecting over local TCP
connections.
This is a slowdown to the point that the downtime for pg_upgrade will be
substantially lengthened compared to before. But I think we should be able to
address that to a large degree.
> In a quick look at the committed patch, it doesn't seem to have
> used any of the speedup strategies we applied to pg_dump a couple
> of years ago. One or the other of these should help:
>
> * Issue a single query to fetch stats from every table we're dumping
> * Set up a prepared query to avoid re-planning the per-table query
> (compare be85727a3)
>
> I'm not sure how workable the first of these would be though.
> It's not hard to imagine it blowing out pg_dump's memory usage
> for a DB with a lot of tables and high default_statistics_target.
We could presumably do the one-query approach for the relation stats, that's
just three integers. That way we'd at least not end up with two queries for
each table (for pg_class.reltuples etc and for pg_stats).
I guess the memory usage could also be addressed by using COPY, but that's
probably unrealistically complicated.
> The second one should be relatively downside-free.
Yea. And at least with pg_dump running locally that's where a lot of the CPU
time is spent.
Remotely doing lots of one-by-one queries will hurt even with prepared
statements though.
One way to largely address that would be to use a prepared statement combined
with libpq pipelining. That still has separate executor startup etc, but I
think it should still reduce the cost to a point where we don't care anymore.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2025-02-21 21:11:54 | Re: TAP test started using meson, can get a tcp port already used by another test. |
Previous Message | Andres Freund | 2025-02-21 20:57:20 | Re: Statistics Import and Export |