Re: Statistics Import and Export

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Robert Treat <rob(at)xzilla(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: Statistics Import and Export
Date: 2025-04-02 05:44:19
Message-ID: ec43b87247fd8700c6fcf4fe0b68cdb2fafecf2d.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2025-04-01 at 22:21 -0500, Nathan Bossart wrote:
> It certainly feels risky.  I was able to avoid executing the queries
> twice
> in all cases by saving the definition length in the TOC entry and
> skipping
> that many bytes the second time round.

That feels like a better approach.

>   That's simple enough, but it relies
> on various assumptions such as fseeko() being available (IIUC the
> file will
> only be open for writing so we cannot fall back on fread()) and
> WriteStr()
> returning an accurate value (which I'm skeptical of because some
> formats
> compress this data).  But AFAICT custom format is the only format
> that does
> a second WriteToc() pass at the moment, and it only does so when
> fseeko()
> is usable.

Even with those assumptions, I think it's much better than querying
twice and assuming that the results are the same.

>   Plus, custom format doesn't appear to compress anything written
> via WriteStr().

If WriteStr() was doing compression, that would make the second
WriteToc() pass to update the data offsets scary even in the existing
code.

> We might be able to improve this by inventing a new callback that
> fails for
> all formats except for custom with feesko() available.  That would at
> least
> ensure hard failures if these assumptions change.  That problably
> wouldn't
> be terribly invasive.  I'm curious what you think.

That sounds fine, I'd say do that if it feels reasonable, and if the
extra callbacks get too messy, we can just document the assumptions
instead.

>
> Hm.  One thing we could do is to send the TocEntry to the callback
> and
> verify that matches the one we were expecting to see next (as set by
> a
> previous call).  Does that sound like a strong enough check?

Again, I'd just be practical here and do the check if it feels natural,
and if not, improve the comments so that someone modifying the code
would know where to look.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2025-04-02 05:55:37 RE: Fix slot synchronization with two_phase decoding enabled
Previous Message Amit Kapila 2025-04-02 04:40:44 Re: Fix slot synchronization with two_phase decoding enabled