Bogosities in pg_dump's extended statistics support

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Bogosities in pg_dump's extended statistics support
Date: 2018-02-11 05:56:46
Message-ID: 18272.1518328606@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While poking around in pg_dump for another purpose, I happened to notice
these things about its handling of extended-statistics objects:

1. pg_dump supposes that a stats object must be in the same schema as
the table it's on. This is flat wrong.

regression=# create schema s1;
CREATE SCHEMA
regression=# create table s1.mytab (f1 int, f2 int);
CREATE TABLE
regression=# create schema s2;
CREATE SCHEMA
regression=# create statistics s2.mystats on f1, f2 from s1.mytab;
CREATE STATISTICS

Even if this weren't flat wrong already, it's not going to scale to the
intended expansion of stats objects to describe multi-table stats.

2. pg_dump supposes that a stats object must have the same owner as
the table it's on. This is also flat wrong, despite the backend
restriction that the stats creator must have ownership privilege
on the table:

regression=# create user tableowner;
CREATE ROLE
regression=# create user statsowner;
CREATE ROLE
regression=# grant tableowner to statsowner;
GRANT ROLE
regression=# \c - tableowner
You are now connected to database "regression" as user "tableowner".
regression=> create table sometable (f1 int, f2 int);
CREATE TABLE
regression=> \c - statsowner
You are now connected to database "regression" as user "statsowner".
regression=> create statistics somestats on f1, f2 from sometable;
CREATE STATISTICS

3. pg_dump decides whether to dump a stats object on the basis of whether
its underlying table is going to be dumped. While that's not totally
silly today, it likewise isn't going to scale to multi-table stats.
I propose that we should just treat extended stats objects like other
generic database objects --- which, in practice, means "dump if the
containing schema is getting dumped". We don't exclude views or matviews
on the basis of whether their underlying tables are going to be dumped,
so why should stats objects operate differently from those?

4. pg_dump issues a separate query against pg_statistic_ext for every
table it's contemplating dumping. This is pretty inefficient, at least
for people with lots of tables, and it also is going to be flat broken
when multi-table stats arrive. We should just do one query and iterate
through all the stats objects at once. Changing the points above will
eliminate any need for pg_dump to associate stats objects with particular
underlying tables at all, so that there's no value in the per-table way.

Points 1 and 2 clearly constitute bugs whose fixes need to be back-patched
into v10, because dump/restore will misbehave today on stats objects
violating those presumptions. One could argue for leaving point 3 alone
in v10, but I'm inclined to back-patch that change as well rather than
allowing people to get used to a non-future-proof behavior.

Comments?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-02-11 06:10:50 Re: CALL stmt, ERROR: unrecognized node type: 113 bug
Previous Message Petr Jelinek 2018-02-11 04:20:41 Re: ALTER TABLE ADD COLUMN fast default