Re: Multivariate MCV stats can leak data to unprivileged users

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multivariate MCV stats can leak data to unprivileged users
Date: 2019-05-20 18:17:14
Message-ID: 20190520181714.fm4a4yjtgffh6kst@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 20, 2019 at 04:09:24PM +0100, Dean Rasheed wrote:
>On Mon, 20 May 2019 at 14:32, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>> Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
>> > On Sun, 19 May 2019 at 23:45, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> >> Oh, right. It still has the disadvantage that it obfuscates the actual
>> >> data stored in the pg_stats_ext_data (or whatever would it be called),
>> >> so e.g. functions would have to do additional checks to make sure it
>> >> actually is the right statistic type. For example pg_mcv_list_items()
>> >> could not rely on receiving pg_mcv_list values, as per the signature,
>> >> but would have to check the value.
>>
>> > Yes. In fact, since the user-accessible view would want to expose
>> > datatypes specific to the stats kinds rather than bytea or cstring
>> > values, we would need SQL-callable conversion functions for each kind:
>>
>> It seems like people are willfully misunderstanding my suggestion.
>
>I'm more than capable of inadvertently misunderstanding, without the
>need to willfully do so :-)
>
>> You'd only need *one* conversion function, which would look at the
>> embedded ID field and then emit the appropriate text representation.
>> I don't see a reason why we'd have the separate pg_ndistinct etc. types
>> any more at all.
>
>Hmm, OK. So then would you also make the user-accessible view agnostic
>about the kinds of stats supported in the same way, returning zero or
>more rows per STATISTICS object, depending on how many kinds of stats
>have been built? That would have the advantage of never needing to
>change the view definition again, as more stats kinds are supported.
>

If I got Tom's proposal right, there'd be only one statistic value in
each pg_stats_ext_data value. It'd be a very thin wrapper, essentially
just the value itself + type flag. So for example if you did

CREATE STATISTICS s (ndistinct, mcv, dependencies) ...

you'd get three rows in pg_statistic_ext_data (assuming all the stats
get actually built).

>We'd need to change pg_mcv_list_items() to accept a pg_stats_ext_data
>value rather than a pg_mcv value, and it would be the user's
>responsibility to call it if they wanted to see the contents of the
>MCV list (I was originally thinking that we'd include a call to
>pg_mcv_list_items() in the view definition, so that it produced
>friendlier looking output, since the default textual representation of
>an MCV list is completely opaque, unlike the other stats kinds).
>Actually, I can see another advantage to not including
>pg_mcv_list_items() in the view definition -- in the future, we may
>dream up a better version of pg_mcv_list_items(), like say one that
>produced JSON, and then we'd regret using the current function.
>

Yeah. As I said, it obfuscates the "actual" type of the stats value, so
we can no longer rely on the function machinery to verify the type. All
functions dealing with the "wrapper" type would have to verify it
actually contains the right statistic type.

>> Anyway, it was just a suggestion, and if people don't like it that's
>> fine. But I don't want it to be rejected on the basis of false
>> arguments.
>
>To be clear, I'm not intentionally rejecting your idea. I'm merely
>trying to fully understand the implications.
>
>At this stage, perhaps it would be helpful to prototype something for
>comparison.
>

I'll look into that. I'll try to whip something up before pgcon, but I
can't guarantee that :-(

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2019-05-20 18:20:01 Re: clean up docs for v12
Previous Message Jeremy Finzel 2019-05-20 18:13:50 Question about new pg12 feature no-rewrite timestamp to timestamptz conversion