Re: Statistics Import and Export

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Robert Treat <rob(at)xzilla(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: Statistics Import and Export
Date: 2025-04-03 02:26:32
Message-ID: Z-3x2AnPCP331JA3@nathan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 01, 2025 at 10:44:19PM -0700, Jeff Davis wrote:
> On Tue, 2025-04-01 at 22:21 -0500, Nathan Bossart wrote:
>> We might be able to improve this by inventing a new callback that fails for
>> all formats except for custom with feesko() available.  That would at least
>> ensure hard failures if these assumptions change.  That problably wouldn't
>> be terribly invasive.  I'm curious what you think.
>
> That sounds fine, I'd say do that if it feels reasonable, and if the
> extra callbacks get too messy, we can just document the assumptions
> instead.

I did write a version with callbacks, but it felt a bit silly because it is
very obviously intended for this one case. So, I removed them in the
attached patch set.

>> Hm.  One thing we could do is to send the TocEntry to the callback and
>> verify that matches the one we were expecting to see next (as set by a
>> previous call).  Does that sound like a strong enough check?
>
> Again, I'd just be practical here and do the check if it feels natural,
> and if not, improve the comments so that someone modifying the code
> would know where to look.

Okay, here is an updated patch set. I did add some verification code,
which ended up being a really good idea because it revealed a couple of
cases we weren't handling:

* Besides custom format calling WriteToc() twice to update the data
offsets, tar format calls WriteToc() followed by RestoreArchive() to
write restore.sql. I couldn't think of a great way to avoid executing
the queries twice in this case, so I settled on allowing it for only that
mode. While we don't expect the second set of queries to result in
different stats definitions, even if it did, the worst case is that the
content of restore.sql (which isn't used by pg_restore) would be
different. I noticed some past discussion that seems to suggest that
this format might be a candidate for deprecation [0], so I'm not sure
it's worth doing anything fancier.

* Our batching code assumes that stats entries are dumped in TOC order,
which unfortunately wasn't true for formats that use RestoreArchive() for
dumping. This is because RestoreArchive() does multiple passes through
the TOC and selectively dumps certain entries each time. This is
particularly troublesome for index stats and a subset of matview stats;
both are in SECTION_POST_DATA, but matview stats that depend on matview
data are dumped in RESTORE_PASS_POST_ACL, while all other stats data is
dumped in RESTORE_PASS_MAIN. To deal with this, I propose moving all
stats entries in SECTION_POST_DATA to RESTORE_PASS_POST_ACL, which
ensures that we always dump stats in TOC order. One convenient side
effect of this change is that we can revert a decent chunk of commit
a0a4601765. It might be possible to do better via smarter lookahead code
or a more sophisticated cache, but it's a bit late in the game for that.

[0] https://postgr.es/m/20180727015306.fzlo4inv5i3zqr2c%40alap3.anarazel.de

--
nathan

Attachment Content-Type Size
v12n5-0001-Skip-second-WriteToc-for-custom-format-dumps-w.patch text/plain 1.7 KB
v12n5-0002-pg_dump-Reduce-memory-usage-of-dumps-with-stat.patch text/plain 11.0 KB
v12n5-0003-pg_dump-Retrieve-attribute-statistics-in-batch.patch text/plain 16.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2025-04-03 02:34:58 Re: Statistics Import and Export
Previous Message Zhijie Hou (Fujitsu) 2025-04-03 02:20:34 RE: Fix slot synchronization with two_phase decoding enabled