Re: Multivariate MCV stats can leak data to unprivileged users

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multivariate MCV stats can leak data to unprivileged users
Date: 2019-05-19 17:39:56
Message-ID: CAEZATCXeP6_8C_k7ai5_xGg_e0+u6f=DTATONxjftOFxK845Zg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 19 May 2019 at 15:28, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> > I wonder ... another way we could potentially do this is
>
> > create table pg_statistic_ext_data(
> > stxoid oid, -- OID of owning pg_statistic_ext entry
> > stxkind char, -- what kind of data
> > stxdata bytea -- the data, in some format or other
> > );
>
> > The advantage of this way is that we'd not have to rejigger the
> > catalog's rowtype every time we think of a new kind of extended
> > stats. The disadvantage is that manual inspection of the contents
> > of an entry would become much harder, for lack of any convenient
> > output function.
>
> No, wait, scratch that. We could fold the three existing types
> pg_ndistinct, pg_dependencies, pg_mcv_list into one new type, say
> "pg_stats_ext_data", where the actual storage would need to have an
> ID field (so we'd waste a byte or two duplicating the externally
> visible stxkind field inside stxdata). The output function for this
> type is just a switch over the existing code. The big advantage of
> this way compared to the current approach is that adding a new
> ext-stats type requires *zero* work with adding new catalog entries.
> Just add another switch case in pg_stats_ext_data_out() and you're
> done.
>

This feels a little over-engineered to me. Presumably there'd be a
compound key on (stxoid, stxkind) and we'd have to scan multiple rows
to get all the applicable stats, whereas currently they're all in one
row. And then the user-accessible view would probably need separate
sub-queries for each stats kind.

If the point is just to avoid adding columns to the catalog in future
releases, I'm not sure it's worth the added complexity. We know that
we will probably add histogram stats in a future release. I'm not sure
how many more kinds we'll end up adding, but it doesn't seem likely to
be a huge number. I think we'll add far more columns to other catalog
tables as we add new features to each release.

Regards,
Dean

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Piotr Stefaniak 2019-05-19 17:50:50 Re: Emacs vs pg_indent's weird indentation for function declarations
Previous Message Tomas Vondra 2019-05-19 17:38:45 Re: Multivariate MCV stats can leak data to unprivileged users