From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Multivariate MCV stats can leak data to unprivileged users |
Date: | 2019-05-16 14:41:50 |
Message-ID: | 20190516144150.vylpxqxn4fspowf7@development |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, May 16, 2019 at 02:28:03PM +0100, Dean Rasheed wrote:
>On Mon, 13 May 2019 at 23:36, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>> Yeah. I suggest we add a simple pg_stats_ext view, similar to pg_stats.
>> It would:
>>
>> (1) translate the schema / relation / attribute names
>>
>> I don't see why translating column indexes to names would be fiddly.
>> It's a matter of simple unnest + join, no? Or what issues you see?
>>
>
>Yeah, you're right. I thought it would be harder than that. One minor
>thing -- I think we should have an explicit ORDER BY where we collect
>the column names, to guarantee that they're listed in the right order.
>
Good point.
>> (2) include values for ndistinct / dependencies, if built
>>
>> Those don't include any actual values, so this should be OK. You might
>> make the argument that even this does leak a bit of information (e.g.
>> when you can see values in one column, and you know there's a strong
>> functional dependence, it tells you something about the other column
>> which you may not see). But I think we kinda already leak information
>> about that through estimates, so maybe that's not an issue.
>>
>
>Hmm. For normal statistics, if the user has no permissions on the
>table, they don't get to see any of these kinds of statistics, not
>even things like n_distinct. I think we should do the same here --
>i.e., if the user has no permissions on the table, don't let them see
>anything. Such a user will not be able to run EXPLAIN on queries
>against the table, so they won't get to see any estimates, and I don't
>think they should get to see any extended statistics either.
>
OK, I haven't realized we don't show that even for normal stats.
>> (3) include MCV list only when user has access to all columns
>>
>> Essentially, if the user is missing 'select' privilege on at least one
>> of the columns, there'll be NULL. Otherwise the MCV value.
>>
>
>OK, that seems reasonable, except as I said above, I think that should
>apply to all statistics, not just the MCV lists.
>
>> The attached patch adds pg_stats_ext doing this. I don't claim it's the
>> best possible query backing the view, so any improvements are welcome.
>>
>>
>> I've been thinking we might somehow filter the statistics values, e.g. by
>> not showing values for attributes the user has no 'select' privilege on,
>> but that seems like a can of worms. It'd lead to MCV items that can't be
>> distinguished because the only difference was the removed attribute, and
>> so on. So I propose we simply show/hide the whole MCV list.
>>
>
>Agreed.
>
>> Likewise, I don't think it makes sense to expand the MCV list in this
>> view - that works for the single-dimensional case, because then the
>> list is expanded into two arrays (values + frequencies), which are easy
>> to process further. But for multivariate MCV lists that's much more
>> complex - we don't know how many attributes are there, for example.
>>
>> So I suggest we just show the pg_mcv_list value as is, and leave it up
>> to the user to call the pg_mcv_list_items() function if needed.
>>
>
>I think expanding the MCV lists is actually quite useful because then
>you can see arrays of values, nulls, frequencies and base frequencies
>in a reasonably readable form (it certainly looks better than a binary
>dump), without needing to join to a function call, which is a bit
>ugly, and unmemorable.
>
Hmmm, ok. I think my main worry here is that it may or may not work for
more complex types of extended stats that are likely to come in the
future. Although, maybe it can be made work even for that.
>The results from the attached look quite reasonable at first glance.
>It contains a few other changes as well:
>
>1). It exposes the schema, name and owner of the statistics object as
>well via the view, for completeness.
>
>2). It changes a few column names -- traditionally these views strip
>off the column name prefix from the underlying table, so I've
>attempted to be consistent with other similar views.
>
>3). I added array-valued columns for each of the MCV list components,
>which makes it more like pg_stats.
>
>4). I moved all the permission checks to the top-level WHERE clause,
>so a user needs to have select permissions on all the columns
>mentioned by the statistics, and the table mustn't have RLS in effect,
>otherwise the user won't see the row for that statistics object.
>
>5). Some columns from pg_statistic_ext have to be made visible for
>psql \d to work. Basically, it needs to be able to query for the
>existence of extended statistics, but it doesn't need to see the
>actual statistical data. Of course, we could change psql to use the
>view, but this way gives us better backwards compatibility with older
>clients.
>
>This is still going to break compatibility of any user code looking at
>stxndistinct or stxdependencies from pg_statistic_ext, but at least it
>doesn't break old versions of psql.
>
>Note: doc and test updates still to do.
>
Thanks. I'm travelling today/tomorrow, but I'll do my best to fill in the
missing bits ASAP.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Fabien COELHO | 2019-05-16 15:15:49 | RE: psql - add SHOW_ALL_RESULTS option |
Previous Message | Dean Rasheed | 2019-05-16 13:28:03 | Re: Multivariate MCV stats can leak data to unprivileged users |