Re: Multivariate MCV stats can leak data to unprivileged users

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multivariate MCV stats can leak data to unprivileged users
Date: 2019-06-06 20:33:08
Message-ID: 20190606203308.h6rfqzr7jhafgnzd@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Attached are three patches tweaking the stats - two were already posted
in this thread, the third one is just updating docs.

1) 0001 - split pg_statistic_ext into definition + data

This is pretty much the patch Dean posted some time ago, rebased to
current master (fixing just minor pgindent bitrot).

2) 0002 - update sgml docs to reflect changes from 0001

3) 0003 - define pg_stats_ext view, similar to pg_stats

The question is whether we want to also redesign pg_statistic_ext_data
per Tom's proposal (more about that later), but I think we can treat
that as an additional step on top of 0001. So I propose we get those
changes committed, and then perhaps also switch the data table to the
EAV model.

Barring objections, I'll do that early next week, after cleaning up
those patches a bit more.

One thing I think we should fix is naming of the attributes in the 0001
patch. At the moment both catalogs use "stx" prefix - e.g. "stxkind" is
in pg_statistic_ext, and "stxmcv" is in pg_statistic_ext_data. We should
probably switch to "stxd" in the _data catalog. Opinions?

Now, back to the proposal to split the _data catalog rows to EAV form,
with a new data type replacing the multiple types we have at the moment.
I've started hacking on it today, but the more I work on it the less
useful it seems to me.

My understanding is that with that approach we'd replace the _data
catalog (which currently has one column per statistic type, with a
separate data type) with 1:M generic rows, with a generic data type.
That is, we'd replace this

CREATE TABLE pg_statistic_ext_data (
stxoid OID,
stxdependencies pg_dependencies,
stxndistinct pg_ndistinct,
stxmcv pg_mcv_list,
... histograms ...
);

with something like this:

CREATE TABLE pg_statistiex_ext_data (
stxoid OID,
stxkind CHAR,
stxdata pg_statistic_ext_type
);

where pg_statistic_ext would store all existing statistic types. along
with a "flag" saying which value it actually stored (essentially a copy
of the stxkind column, which we however need to lookup a statistic of a
certain type, without having to detoast the statistic itself).

As I mentioned before, I kinda dislike the fact that this obfuscates the
actual statistic type by hiding it behing the "wrapper" type.

The other thing is that we have to deal with 1:M relationship every time
we (re)build the statistics, or when we need to access them. Now, it may
not be a huge amount of code, but it just seems unnecessary. It would
make sense if we planned to add large number of additional statistic
types, but that seems unlikely - I personally can think of maybe one new
statistic type, but that's about it.

I'll continue working on it and I'll share the results early next week,
after playing with it a bit, but I think we should get the existing
patches committed and then continue discussing this as an additional
improvement.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-split-up-pg_statistics_ext.patch text/plain 34.2 KB
0002-update-docs-after-splitting-stats.patch text/plain 6.8 KB
0003-add-pg_stats_ext-view.patch text/plain 5.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-06-06 20:40:53 tableam: abstracting relation sizing code
Previous Message David Rowley 2019-06-06 19:55:54 Re: Why does not subquery pruning conditions inherit to parent query?