From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Treat <rob(at)xzilla(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org |
Subject: | Re: Statistics Import and Export |
Date: | 2025-03-16 20:33:49 |
Message-ID: | Z9c1rbzZegYQTOQE@nathan |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks for working on this, Corey.
On Fri, Mar 14, 2025 at 04:03:16PM -0400, Corey Huinker wrote:
> 0003 -
>
> Storing the restore function calls in the archive entry hogged a lot of
> memory and made people nervous. This introduces a new function pointer that
> generates those restore SQL calls right before they're written to disk,
> thus reducing the memory load from "stats for every object to be dumped" to
> just one object. Thanks to Nathan for diagnosing some weird quirks with
> various formats.
>
> 0004 -
>
> This replaces the query in the prepared statement with one that batches
> them 100 relations at a time, and then maintains that result set until it
> is consumed. It seems to have obvious speedups.
I've been doing a variety of tests with my toy database of 100K relations
[0], and I'm seeing around 20% less memory usage. That's still 20% more
than without stats, but that's still a pretty nice improvement.
I'd propose two small changes to the design:
* I tested a variety of batch sizes, and to my suprise, I saw the best
results with around 64 relations per batch. I imagine the absolute best
batch size will vary greatly depending on the workload. It might also
depend on work_mem and friends.
* The custom format actually does two WriteToc() calls, and since these
patches move the queries to this part of pg_dump, it means we'll run all
the queries twice. The comments around this code suggest that the second
pass isn't strictly necessary and that it is really only useful for
data/parallel restore, so we could probably skip it for no-data dumps.
With those two changes, a pg_upgrade-style dump of my test database goes
from ~21.6 seconds without these patches to ~11.2 seconds with them. For
reference, the same dump without stats takes ~7 seconds on HEAD.
[0] https://postgr.es/m/Z9R9-mFbxukqKmg4%40nathan
--
nathan
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Smith | 2025-03-16 21:15:23 | Re: Tidy recent code bloat in pg_creatersubscriber::cleanup_objects_atexit |
Previous Message | Shayon Mukherjee | 2025-03-16 15:04:56 | Re: Proposal to Enable/Disable Index using ALTER INDEX (with patch) |