Re: Statistics Import and Export

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: Statistics Import and Export
Date: 2024-09-05 17:29:44
Message-ID: CADkLM=dJ3grfy=PuUWgmCAfpGOyoXTJQr2UWo+FqYTimeYm3dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> I have attached version 28j as one giant patch covering what was
> previously 0001-0003. It's a bit rough (tests in particular need some
> work), but it implelements the logic to replace only those values
> specified rather than the whole tuple.
>

I like what you did restoring the parameter enums, especially now that they
can be leveraged for the expected type oids data structure.

> At least for the interactive "set" variants of the functions, I think
> it's an improvement. It feels more natural to just change one stat
> without wiping out all the others. I realize a lot of the statistics
> depend on each other, but the point is not to replace ANALYZE, the
> point is to experiment with planner scenarios. What do others think?
>

When I first heard that was what you wanted to do, I was very uneasy about
it. The way you implemented it (one function to wipe out/reset all existing
stats, and then the _set_ function works as an upsert) puts my mind at
ease. The things I really wanted to avoid were gaps in the stakind array
(which can't happen as you wrote it) and getting the stakinds out of order
(admittedly that's more a tidiness issue, but pg_statistic before/after
fidelity is kept, so I'm happy).

For the "restore" variants, I'm not sure it matters a lot because the
> stats will already be empty. If it does matter, we could pretty easily
> define the "restore" variants to wipe out existing stats when loading
> the table, though I'm not sure if that's a good thing or not.
>

I agree, and I'm leaning towards doing the clear, because "restore" to me
implies that what resides there exactly matches what was in the function
call, regardless of what might have been there before. But you're also
right, "restore" is expected to be used on default/missing stats, and the
restore_* call generated is supposed to be comprehensive of all stats that
were there at time of dump/upgrade, so impact would be minimal.

> I also made more use of FunctionCallInfo structures to communicate
> between functions rather than huge parameter lists. I believe that
> reduced the line count substantially, and made it easier to transform
> the argument pairs in the "restore" variants into the positional
> arguments for the "set" variants.
>

You certainly did, and I see where it pays off given that _set_ / _restore_
functions are just different ways of ordering the shared internal function
call.

Observation: there is currently no way to delete a stakind, keeping the
rest of the record. It's certainly possible to compose a SQL query that
gets the current values, invokes pg_clear_* and then pg_set_* using the
values that are meant to be kept, and in fact that pattern is how I
imagined the pg_set_* functions would be used when they overwrote
everything in the tuple. So I am fine with going forward with this paradigm.

The code mentions that more explanation should be given for the special
cases (tsvector, etc) and that explanation is basically "this code follows
what the corresponding custom typanalyze() function does". In the future,
it may make sense to have custom typimport() functions for datatypes that
have a custom typanalzye(), which would solve the issue of handling custom
stakinds.

I'll continue to work on this.

p.s. dropping invalid email address from the thread

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-09-05 17:33:42 Re: scalability bottlenecks with (many) partitions (and more)
Previous Message Tomas Vondra 2024-09-05 17:21:01 Re: scalability bottlenecks with (many) partitions (and more)