From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
Cc: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Statistics Import and Export |
Date: | 2023-11-02 13:52:20 |
Message-ID: | 76596388-6fe6-0baf-351d-734458a46d76@enterprisedb.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 11/2/23 06:01, Corey Huinker wrote:
>
>
> Maybe I just don't understand, but I'm pretty sure ANALYZE does not
> derive index stats from column stats. It actually builds them from the
> row sample.
>
>
> That is correct, my error.
>
>
>
> > * now support extended statistics except for MCV, which is currently
> > serialized as an difficult-to-decompose bytea field.
>
> Doesn't pg_mcv_list_items() already do all the heavy work?
>
>
> Thanks! I'll look into that.
>
> The comment below in mcv.c made me think there was no easy way to get
> output.
>
> /*
> * pg_mcv_list_out - output routine for type pg_mcv_list.
> *
> * MCV lists are serialized into a bytea value, so we simply call byteaout()
> * to serialize the value into text. But it'd be nice to serialize that into
> * a meaningful representation (e.g. for inspection by people).
> *
> * XXX This should probably return something meaningful, similar to what
> * pg_dependencies_out does. Not sure how to deal with the deduplicated
> * values, though - do we want to expand that or not?
> */
>
Yeah, that was the simplest output function possible, it didn't seem
worth it to implement something more advanced. pg_mcv_list_items() is
more convenient for most needs, but it's quite far from the on-disk
representation.
That's actually a good question - how closely should the exported data
be to the on-disk format? I'd say we should keep it abstract, not tied
to the details of the on-disk format (which might easily change between
versions).
I'm a bit confused about the JSON schema used in pg_statistic_export
view, though. It simply serializes stakinds, stavalues, stanumbers into
arrays ... which works, but why not to use the JSON nesting? I mean,
there could be a nested document for histogram, MCV, ... with just the
correct fields.
{
...
histogram : { stavalues: [...] },
mcv : { stavalues: [...], stanumbers: [...] },
...
}
and so on. Also, what does TRIVIAL stand for?
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2023-11-02 13:58:22 | Re: Document efficient self-joins / UPDATE LIMIT techniques. |
Previous Message | Peter Eisentraut | 2023-11-02 13:52:14 | Re: Explicitly skip TAP tests under Meson if disabled |