Re: Statistics Import and Export

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: Statistics Import and Export
Date: 2024-03-21 19:10:42
Message-ID: CADkLM=dw=fkvPRmOw74UDYnLzXwkDyMSY1k2jsU5=xXBAj7Mpw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> How about just some defaults then? Many of them have a reasonable
> default, like NULL or an empty array. Some are parallel arrays and
> either both should be specified or neither (e.g.
> most_common_vals+most_common_freqs), but you can check for that.
>

+1
Default NULL has been implemented for all parameters after n_distinct.

>
> > > Why are you calling checkCanModifyRelation() twice?
> >
> > Once for the relation itself, and once for pg_statistic.
>
> Nobody has the privileges to modify pg_statistic except superuser,
> right? I thought the point of a privilege check is that users could
> modify statistics for their own tables, or the tables they maintain.
>

In which case wouldn't the checkCanModify on pg_statistic would be a proxy
for is_superuser/has_special_role_we_havent_created_yet.

>
> >
> > I can see making it void and returning an error for everything that
> > we currently return false for, but if we do that, then a statement
> > with one pg_set_relation_stats, and N pg_set_attribute_stats (which
> > we lump together in one command for the locking benefits and atomic
> > transaction) would fail entirely if one of the set_attributes named a
> > column that we had dropped. It's up for debate whether that's the
> > right behavior or not.
>
> I'd probably make the dropped column a WARNING with a message like
> "skipping dropped column whatever". Regardless, have some kind of
> explanatory comment.
>

That's certainly do-able.

>
> >
> > I pulled most of the hardcoded values from pg_stats itself. The
> > sample set is trivially small, and the values inserted were in-order-
> > ish. So maybe that's why.
>
> In my simple test, most_common_freqs is descending:
>
> CREATE TABLE a(i int);
> INSERT INTO a VALUES(1);
> INSERT INTO a VALUES(2);
> INSERT INTO a VALUES(2);
> INSERT INTO a VALUES(3);
> INSERT INTO a VALUES(3);
> INSERT INTO a VALUES(3);
> INSERT INTO a VALUES(4);
> INSERT INTO a VALUES(4);
> INSERT INTO a VALUES(4);
> INSERT INTO a VALUES(4);
> ANALYZE a;
> SELECT most_common_vals, most_common_freqs
> FROM pg_stats WHERE tablename='a';
> most_common_vals | most_common_freqs
> ------------------+-------------------
> {4,3,2} | {0.4,0.3,0.2}
> (1 row)
>
> Can you show an example where it's not?
>

Not off hand, no.

>
> >
> > Maybe we could have the functions restricted to a role or roles:
> >
> > 1. pg_write_all_stats (can modify stats on ANY table)
> > 2. pg_write_own_stats (can modify stats on tables owned by user)
>
> If we go that route, we are giving up on the ability for users to
> restore stats on their own tables. Let's just be careful about
> validating data to mitigate this risk.
>

A great many test cases coming in the next patch.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amonson, Paul D 2024-03-21 19:17:54 RE: Popcount optimization using AVX512
Previous Message Jeff Davis 2024-03-21 19:07:38 Re: Comments on Custom RMGRs