From: | Nathan Bossart <nathan(at)postgresql(dot)org> |
---|---|
To: | pgsql-committers(at)lists(dot)postgresql(dot)org |
Subject: | pgsql: pg_dump: Reduce memory usage of dumps with statistics. |
Date: | 2025-04-04 19:51:56 |
Message-ID: | E1u0n52-002gUJ-2M@gemulon.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers |
pg_dump: Reduce memory usage of dumps with statistics.
Right now, pg_dump stores all generated commands for statistics in
memory. These commands can be quite large and therefore can
significantly increase pg_dump's memory footprint. To fix, wait
until we are about to write out the commands before generating
them, and be sure to free the commands after writing. This is
implemented via a new defnDumper callback that works much like the
dataDumper one but is specifically designed for TOC entries.
Custom dumps that include data might write the TOC twice (to update
data offset information), which would ordinarily cause pg_dump to
run the attribute statistics queries twice. However, as a hack, we
save the length of the written-out entry in the first pass and skip
over it in the second. While there is no known technical issue
with executing the queries multiple times and rewriting the
results, it's expensive and feels risky, so let's avoid it.
As an exception, we _do_ execute the queries twice for the tar
format. This format does a second pass through the TOC to generate
the restore.sql file. pg_restore doesn't use this file, so even if
the second round of queries returns different results than the
first, it won't corrupt the output; the archive and restore.sql
file will just have different content. A follow-up commit will
teach pg_dump to gather attribute statistics in batches, which our
testing indicates more than makes up for the added expense of
running the queries twice.
Author: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Co-authored-by: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Reviewed-by: Jeff Davis <pgsql(at)j-davis(dot)com>
Discussion: https://postgr.es/m/CADkLM%3Dc%2Br05srPy9w%2B-%2BnbmLEo15dKXYQ03Q_xyK%2BriJerigLQ%40mail.gmail.com
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/7d5c83b4e90c7156655f98b7312a30ae5eeb4d27
Modified Files
--------------
src/bin/pg_dump/pg_backup.h | 1 +
src/bin/pg_dump/pg_backup_archiver.c | 83 +++++++++++++++++++++++++++++++++++-
src/bin/pg_dump/pg_backup_archiver.h | 6 +++
src/bin/pg_dump/pg_dump.c | 46 ++++++++++++++------
4 files changed, 120 insertions(+), 16 deletions(-)
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2025-04-04 19:55:47 | pgsql: Remove unused function parameters in pg_backup_archiver.c. |
Previous Message | Melanie Plageman | 2025-04-04 19:29:26 | pgsql: Remove superfluous autoprewarm check |