Re: Extended Statistics set/restore/clear functions.

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Extended Statistics set/restore/clear functions.
Date: 2025-01-23 20:52:55
Message-ID: CADkLM=fA41UM2b5Fk8fSsCh-gCn_Q8UfP7bHJCbNGmZCG+USxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
>
>
> > * no negative attnums in key list
>

Disregard this suggestion - negative attnums mean the Nth expression in the
extended stats object, though it boggles the mind how we could have 222
expressions...

> > * no duplicate attnums in key list
>

This one is still live, am considering.

At this point I was really thinking only about validating the attnums,
> i.e. to make sure it's a valid attribute in the table / statistics. That
> is something the pg_set_attribute_stats() enforce too, thanks to having
> a separate argument for the attribute name.
>
> That's where I'd stop. I don't want to do checks on the statistics
> content, like verifying the frequencies in the MCV sum up to 1.0 or
> stuff like that. I think we're not doing that for pg_set_attribute_stats
>

Agreed.

> either (and I'd bet one could cause a lot of "fun" this way).
>

If by "fun" you mean "create a fuzzing tool", then yes.

As an aside, the "big win" in all these functions is the ability to dump a
database --no-data, but have all the schema and statistics, thus allowing
for checking query plans on existing databases with sensitive data while
not actually exposing the data (except mcv, obvs), nor spending the I/O to
load that data.

> Understood. IMHO it's fine to say we're not validating the statistics
> are "consistent" but I think we should check it matches the definition.
>

+1

> > I suppose someone could write the following utility functions
> >
> > pg_xlat_ndistinct_to_attnames(relation reloid, ndist pg_ndistinct) -
> >> json
> > pg_xlat_ndistinct_from_attnames(relation reloid, ndist json) ->
> > pg_ndistinct
> >
> > and that would bridge the gap for the special case where you want to
> > adapt pg_ndistinct from one table structure to a slightly different one.
> >
> >
>
> OK
>

As they'll be pure-SQL functions, I'll likely post the definitions here,
but not put them into a patch unless it draws interest.
> For that matter, it might make sense to break out the expressions code

> > into its own file, because every other stat attribute has its own.
> > Thoughts on that?
> >
>
> +1 to that, if it reduced unnecessary code duplication
>

I'm uncertain that it actually would deduplicate any code, but I'll
certainly try.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2025-01-23 21:02:24 Re: Orphaned users in PG16 and above can only be managed by Superusers
Previous Message Andres Freund 2025-01-23 20:51:04 Re: Orphaned users in PG16 and above can only be managed by Superusers