Re: Statistics Import and Export

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Treat <rob(at)xzilla(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: Statistics Import and Export
Date: 2025-04-01 02:33:15
Message-ID: Z-tQa5zsVkcCyYin@nathan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 31, 2025 at 11:11:47AM -0400, Corey Huinker wrote:
> In light of v11-0001 being committed as 4694aedf63bf, I've rebased the
> remaining patches.

I spent the day preparing these for commit. A few notes:

* I've added a new prerequisite patch that skips the second WriteToc() call
for custom-format dumps that do not include data. After some testing and
code analysis, I haven't identified any examples where this produces
different output. This doesn't help much on its own, but it will become
rather important when we move the attribute statistics queries to happen
within WriteToc() in 0002.

* I was a little worried about the correctness of 0002 for dumps that run
the attribute statistics queries twice, but I couldn't identify any
problems here either.

* I removed a lot of miscellaneous refactoring that seemed unnecessary for
these patches. Let's move that to another patch set and keep these as
simple as possible.

* I made a small adjustment to the TOC scan restarting logic in
fetchAttributeStats(). Specifically, we now only allow the scan to
restart once for custom-format dumps that include data.

* While these patches help decrease pg_dump's memory footprint, I believe
pg_restore still reads the entire TOC into memory. That's not this patch
set's problem, but I think it's still an important consideration for the
bigger picture.

Regarding whether pg_dump should dump statistics by default, my current
thinking is that it shouldn't, but I think we _should_ have pg_upgrade
dump/restore statistics by default because that is arguably the most
important use-case. This is more a gut feeling than anything, so I reserve
the right to change my opinion.

My goal is to commit the attached patches on Friday morning, but of course
that is subject to change based on any feedback or objections that emerge
in the meantime.

--
nathan

Attachment Content-Type Size
v12n-0001-Skip-second-WriteToc-for-custom-format-dumps-wi.patch text/plain 1.6 KB
v12n-0002-pg_dump-Reduce-memory-usage-of-dumps-with-stati.patch text/plain 8.1 KB
v12n-0003-pg_dump-Batch-queries-for-retrieving-attribute-.patch text/plain 8.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Treat 2025-04-01 03:02:49 Re: Statistics Import and Export
Previous Message Junwang Zhao 2025-04-01 02:27:24 Re: general purpose array_sort