pgsql: pg_dump: Reduce memory usage of dumps with statistics.

From: Nathan Bossart <nathan(at)postgresql(dot)org>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: pg_dump: Reduce memory usage of dumps with statistics.
Date: 2025-04-04 19:51:56
Message-ID: E1u0n52-002gUJ-2M@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

pg_dump: Reduce memory usage of dumps with statistics.

Right now, pg_dump stores all generated commands for statistics in
memory. These commands can be quite large and therefore can
significantly increase pg_dump's memory footprint. To fix, wait
until we are about to write out the commands before generating
them, and be sure to free the commands after writing. This is
implemented via a new defnDumper callback that works much like the
dataDumper one but is specifically designed for TOC entries.

Custom dumps that include data might write the TOC twice (to update
data offset information), which would ordinarily cause pg_dump to
run the attribute statistics queries twice. However, as a hack, we
save the length of the written-out entry in the first pass and skip
over it in the second. While there is no known technical issue
with executing the queries multiple times and rewriting the
results, it's expensive and feels risky, so let's avoid it.

As an exception, we _do_ execute the queries twice for the tar
format. This format does a second pass through the TOC to generate
the restore.sql file. pg_restore doesn't use this file, so even if
the second round of queries returns different results than the
first, it won't corrupt the output; the archive and restore.sql
file will just have different content. A follow-up commit will
teach pg_dump to gather attribute statistics in batches, which our
testing indicates more than makes up for the added expense of
running the queries twice.

Author: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Co-authored-by: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Reviewed-by: Jeff Davis <pgsql(at)j-davis(dot)com>
Discussion: https://postgr.es/m/CADkLM%3Dc%2Br05srPy9w%2B-%2BnbmLEo15dKXYQ03Q_xyK%2BriJerigLQ%40mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/7d5c83b4e90c7156655f98b7312a30ae5eeb4d27

Modified Files
--------------
src/bin/pg_dump/pg_backup.h | 1 +
src/bin/pg_dump/pg_backup_archiver.c | 83 +++++++++++++++++++++++++++++++++++-
src/bin/pg_dump/pg_backup_archiver.h | 6 +++
src/bin/pg_dump/pg_dump.c | 46 ++++++++++++++------
4 files changed, 120 insertions(+), 16 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Nathan Bossart 2025-04-04 19:55:47 pgsql: Remove unused function parameters in pg_backup_archiver.c.
Previous Message Melanie Plageman 2025-04-04 19:29:26 pgsql: Remove superfluous autoprewarm check