Re: list of extended statistics on psql

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp>, "Shinoda, Noriyoshi (PN Japan FSIP)" <noriyoshi(dot)shinoda(at)hpe(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: list of extended statistics on psql
Date: 2021-01-19 02:52:22
Message-ID: 914705d5-6dc0-817b-9405-891830f43841@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/19/21 1:44 AM, Tatsuro Yamada wrote:
> Hi Tomas,
>
>>>> As for how to deal with this, I can think of about three ways:
>>>>
>>>> 1) simplify the command to only print information from
>>>> pg_statistic_ext (so on information about which stats are built or
>>>> sizes)
>>>>
>>>> 2) extend pg_stats_ext with necessary information (e.g. sizes)
>>>>
>>>> 3) create a new system view, with necessary information (so that
>>>> pg_stats_ext does not need to be modified)
>>>>
>>>> 4) add functions returning the necessary information, possibly only
>>>> for statistics the user can access (similarly to what pg_stats_ext
>>>> does)
>>>>
>>>> Options 2-4 have the obvious disadvantage that this won't work on
>>>> older releases (we can't add views or functions there). So I'm
>>>> leaning towards #1 even if that means we have to remove some of the
>>>> details. We can consider adding that for new releases, though.
>>>
>>>
>>> Thanks for the useful advice. I go with option 1).
>>> The following query is created by using pg_stats_ext instead of
>>> pg_statistic_ext and pg_statistic_ext_data. However, I was confused
>>> about writing a part of the query for calculating MCV size because
>>> there are four columns related to MCV. For example, most_common_vals,
>>> most_common_val_nulls, most_common_freqs, and most_common_base_freqs.
>>> Currently, I don't know how to calculate the size of MCV by using the
>>> four columns. Thoughts? :-)
>>
>> Well, my suggestion was to use pg_statistic_ext, because that lists
>> all statistics, while pg_stats_ext is filtering statistics depending
>> on access privileges. I think that's more appropriate for \dX, the
>> contents should not change depending on the user.
>>
>> Also, let me clarify - with option (1) we'd not show the sizes at all.
>> The size of the formatted statistics may be very different from the
>> on-disk representation, so I see no point in showing it in \dX.
>>
>> We might show other stats (e.g. number of MCV items, or the fraction
>> of data represented by the MCV list), but the user can inspect
>> pg_stats_ext if needed.
>>
>> What we might do is to show those stats when a superuser is running
>> this command, but I'm not sure that's a good idea (or how difficult
>> would it be to implement).
>
>
> Thanks for clarifying.
> I see that your suggestion was to use pg_statistic_ext, not pg_stats_ext.
> And we don't need the size of stats.
>
> If that's the case, we also can't get the status of stats since PG12 or
> later
> because we can't use pg_statistic_ext_data, as you know. Therefore, it
> would be
> better to replace the query with the old query that I sent five months
> ago like this:
>
> # the old query
> SELECT
>     stxnamespace::pg_catalog.regnamespace AS "Schema",
>     stxrelid::pg_catalog.regclass AS "Table",
>     stxname AS "Name",
>     (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
>      FROM pg_catalog.unnest(stxkeys) s(attnum)
>      JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND
>      a.attnum = s.attnum AND NOT attisdropped)) AS "Columns",
>     'd' = any(stxkind) AS "Ndistinct",
>     'f' = any(stxkind) AS "Dependencies",
>     'm' = any(stxkind) AS "MCV"
> FROM pg_catalog.pg_statistic_ext stat
> ORDER BY 1,2;
>
>  Schema | Table  |    Name    | Columns | Ndistinct | Dependencies | MCV
> --------+--------+------------+---------+-----------+--------------+-----
>  public | hoge1  | hoge1_ext  | a, b    | t         | t            | t
>  public | hoge_t | hoge_t_ext | a, b    | t         | t            | t
> (2 rows)
>
>
> The above query is so simple so that we would better to use the
> following query:
>
> # This query works on PG10 or later
> SELECT
>     es.stxnamespace::pg_catalog.regnamespace::text AS "Schema",
>     es.stxname AS "Name",
>     pg_catalog.format('%s FROM %s',
>         (SELECT
> pg_catalog.string_agg(pg_catalog.quote_ident(a.attname),', ')
>          FROM pg_catalog.unnest(es.stxkeys) s(attnum)
>          JOIN pg_catalog.pg_attribute a
>          ON (es.stxrelid = a.attrelid
>          AND a.attnum = s.attnum
>          AND NOT a.attisdropped)),
>     es.stxrelid::regclass) AS "Definition",
>     CASE WHEN 'd' = any(es.stxkind) THEN 'defined'
>     END AS "Ndistinct",
>     CASE WHEN 'f' = any(es.stxkind) THEN 'defined'
>     END AS "Dependencies",
>     CASE WHEN 'm' = any(es.stxkind) THEN 'defined'
>     END AS "MCV"
> FROM pg_catalog.pg_statistic_ext es
> ORDER BY 1, 2;
>
>  Schema |    Name    |    Definition    | Ndistinct | Dependencies |
> Dependencies
> --------+------------+------------------+-----------+--------------+--------------
>
>  public | hoge1_ext  | a, b FROM hoge1  | defined   | defined      |
> defined
>  public | hoge_t_ext | a, b FROM hoge_t | defined   | defined      |
> defined
> (2 rows)
>
>
> I'm going to create the WIP patch to use the above queriy.
> Any comments welcome. :-D
>

Yes, I think using this simpler query makes sense. If we decide we need
something more elaborate, we can improve that by in future PostgreSQL
versions (after adding view/function to core), but I'd leave that as a
work for the future.

Apologies for all the extra work - I haven't realized this flaw when
pushing for showing more stuff :-(

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2021-01-19 03:01:47 Re: simplifying foreign key/RI checks
Previous Message Amit Langote 2021-01-19 02:45:29 Re: simplifying foreign key/RI checks