Re: Statistics Import and Export

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Jeff Davis <pgsql(at)j-davis(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: Statistics Import and Export
Date: 2025-02-24 18:47:19
Message-ID: CADkLM=eWSv2z_DVB=psfwHitq7DbFJQaP9A46R27eARC-4OYtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 24, 2025 at 12:51 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2025-02-24 05:11:48 -0500, Corey Huinker wrote:
> >> * relpages/reltuples/relallvisible are now char[32] buffers in
> RelStatsInfo
> >> and nowhere else (existing relpages conversion remains, however)
>
> > I don't see the point. This will use more memory and if we can't get
> > conversions between integers and strings right we have much bigger
> > problems. The same code was used in the backend too!
>
> I don't like that either. But there's a bigger problem with 0002:
> it's still got mostly table-driven output. I've been working on
> fixing the problem discussed over in the -committers thread about how
> we need to identify index-expression columns by number not name [1].
>

There doesn't seem to be any way around it, but it will slightly complicate
the dump-ing side of things, in that we need to either:

a) switch to attnums for index expressions and keep attname calls for
everything else.

b) track what the attnum will be on the destination side, which will be
different when we're not doing a binary upgrade and there are any preceding
dropped columns.

The patch Tom provided opens the door for option "a", and I'm inclined to
take it.

> It's not too awful in the backend (WIP patch attached), but
> getting appendAttStatsImport to do it seems like a complete disaster,
> and this patch fails to make that any easier. It'd be much better
> if you gave up on that table-driven business and just open-coded the
> handling of the successive output values as was discussed upthread.
>

Can do.

> I don't think the table-driven approach has anything to recommend it
> anyway. It requires keeping att_stats_arginfo[] in sync with the
> query in getAttStatsExportQuery, an extremely nonobvious (and
> undocumented) connection. Personally I would nuke the separate
> getAttStatsExportQuery and appendAttStatsImport functions altogether,
> and have one function that executes a query and immediately interprets
> the PGresult.
>

+1, though that comes at the cost of shutting off the possibility of a mass
fetch from pg_stats without also rendering the pg_restore_attribute_stats
calls at the same time.

>
> Also, while working on the attached, I couldn't help forming the
> opinion that we'd be better off to nuke pg_set_attribute_stats()
> from orbit and require people to use pg_restore_attribute_stats().
> pg_set_attribute_stats() would be fine if we had a way to force
> people to call it with only named-argument notation, but we don't.
> So I'm afraid that its existence will encourage people to rely
> on a specific parameter order, and then they'll whine if we
> add/remove/reorder parameters, as indeed I had to do below.
>

They've always had split goals. To recap for people just joining the show,
the "set" family had the following properties:

1. transactional, even for pg_class
2. assumes all stats given are relevant and correct for current db version
3. guaranteed to ERROR if any parameter doesn't check out
4. unstable call signature, can and will change to match the realities of
the current version
5. intended for planner experiments and fuzzing

and the "restore" family has the following properties:

1. will inplace update pg_class to avoid table bloat
2. states the version from whence the stats came, so that adjustments can
be made to suit the current db version, up to and including rejecting that
particular statistic
3. attempts to sidestep errors with WARNINGs so as not to kill a restore
4. stable but highly fluid kwargs-ish call signature
5. intended to be machine generated and used only in restore/upgrade

The attnum change certainly throws a wrench into that, and if we get rid of
the setter functions then we will need to (re)introduce parameters to
indicate our choice for properties 1 and 3. I suppose we could use the
existence or non-existence of the "version" parameter as an indicator of
which mode we want (if it exists, we want WARNINGS and inplace updates, if
not we want pure transactional and ERROR at the first problem), but I'm not
certain that proxy will hold true in the future.

> BTW, I pushed the 0003 patch with minor adjustments.
>

Thanks!

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2025-02-24 18:54:43 Re: Statistics Import and Export
Previous Message Jeff Davis 2025-02-24 18:42:40 Re: Statistics Import and Export