Re: list of extended statistics on psql

From: Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "Shinoda, Noriyoshi (PN Japan FSIP)" <noriyoshi(dot)shinoda(at)hpe(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: list of extended statistics on psql
Date: 2021-01-18 07:31:56
Message-ID: e5268d6a-c361-e017-4eb7-175a2f3fd8fc@nttcom.co.jp_1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tomas and Shinoda-san,

On 2021/01/17 23:31, Tomas Vondra wrote:
>
>
> On 1/17/21 3:01 AM, Tomas Vondra wrote:
>> On 1/17/21 2:41 AM, Shinoda, Noriyoshi (PN Japan FSIP) wrote:
>>> Hi, hackers.
>>>
>>> I tested this committed feature.
>>> It doesn't seem to be available to non-superusers due to the inability to access pg_statistics_ext_data.
>>> Is this the expected behavior?

Ugh. I overlooked the test to check the case of the user hasn't Superuser privilege. The user without the privilege was able to access pg_statistics_ext. Therefore I supposed that it's also able to access pg_statics_ext_data. Oops.

>> Hmmm, that's a good point. Bummer we haven't noticed that earlier.
>>
>> I wonder what the right fix should be - presumably we could do something like pg_stats_ext (we can't use that view directly, because it formats the data, so the sizes are different).
>>
>> But should it list just the stats the user has access to, or should it list everything and leave the inaccessible fields NULL?
>>
>
> I've reverted the commit - once we find the right way to handle this, I'll get it committed again.
>
> As for how to deal with this, I can think of about three ways:
>
> 1) simplify the command to only print information from pg_statistic_ext (so on information about which stats are built or sizes)
>
> 2) extend pg_stats_ext with necessary information (e.g. sizes)
>
> 3) create a new system view, with necessary information (so that pg_stats_ext does not need to be modified)
>
> 4) add functions returning the necessary information, possibly only for statistics the user can access (similarly to what pg_stats_ext does)
>
> Options 2-4 have the obvious disadvantage that this won't work on older releases (we can't add views or functions there). So I'm leaning towards #1 even if that means we have to remove some of the details. We can consider adding that for new releases, though.

Thanks for the useful advice. I go with option 1).
The following query is created by using pg_stats_ext instead of pg_statistic_ext and pg_statistic_ext_data. However, I was confused
about writing a part of the query for calculating MCV size because
there are four columns related to MCV. For example, most_common_vals, most_common_val_nulls, most_common_freqs, and most_common_base_freqs.
Currently, I don't know how to calculate the size of MCV by using the
four columns. Thoughts? :-)

===================================================
\connect postgres hoge
create table hoge_t(a int, b int);
insert into hoge_t select i,i from generate_series(1,100) i;
create statistics hoge_t_ext on a, b from hoge_t;

SELECT
es.statistics_schemaname AS "Schema",
es.statistics_name AS "Name",
pg_catalog.format('%s FROM %s',
(SELECT pg_catalog.string_agg(pg_catalog.quote_ident(s.attname),', ')
FROM pg_catalog.unnest(es.attnames) s(attname)),
es.tablename) AS "Definition",
CASE WHEN es.n_distinct IS NOT NULL THEN 'built'
WHEN 'd' = any(es.kinds) THEN 'requested'
END AS "Ndistinct",
CASE WHEN es.dependencies IS NOT NULL THEN 'built'
WHEN 'f' = any(es.kinds) THEN 'requested'
END AS "Dependencies",
CASE WHEN es.most_common_vals IS NOT NULL THEN 'built'
WHEN 'm' = any(es.kinds) THEN 'requested'
END AS "MCV",
CASE WHEN es.n_distinct IS NOT NULL THEN
pg_catalog.pg_size_pretty(pg_catalog.length(es.n_distinct)::bigint)
WHEN 'd' = any(es.kinds) THEN '0 bytes'
END AS "Ndistinct_size",
CASE WHEN es.dependencies IS NOT NULL THEN
pg_catalog.pg_size_pretty(pg_catalog.length(es.dependencies)::bigint)
WHEN 'f' = any(es.kinds) THEN '0 bytes'
END AS "Dependencies_size"
FROM pg_catalog.pg_stats_ext es
ORDER BY 1, 2;

-[ RECORD 1 ]-----+-----------------
Schema | public
Name | hoge_t_ext
Definition | a, b FROM hoge_t
Ndistinct | requested
Dependencies | requested
MCV | requested
Ndistinct_size | 0 bytes
Dependencies_size | 0 bytes

analyze hoge_t;

-[ RECORD 1 ]-----+-----------------
Schema | public
Name | hoge_t_ext
Definition | a, b FROM hoge_t
Ndistinct | built
Dependencies | built
MCV | built
Ndistinct_size | 13 bytes
Dependencies_size | 40 bytes
===================================================

Thanks,
Tatsuro Yamada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tang, Haiying 2021-01-18 07:32:24 RE: Parallel INSERT (INTO ... SELECT ...)
Previous Message Tatsuro Yamada 2021-01-18 07:24:58 Re: list of extended statistics on psql