From: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, Tomas Vondra <tomas(dot)vondra(at)2ndQuadrant(dot)com> |
Subject: | Re: Bogosities in pg_dump's extended statistics support |
Date: | 2018-02-12 15:07:08 |
Message-ID: | 20180212150708.4vt2p2237fyh6bfx@alvherre.pgsql |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks for patching. Your points #1, #2 and #4 are OK with me. I'm
thinking about point #3, concerning when-to-dump policy for stats
objects:
Tom Lane wrote:
> 3. pg_dump decides whether to dump a stats object on the basis of whether
> its underlying table is going to be dumped. While that's not totally
> silly today, it likewise isn't going to scale to multi-table stats.
> I propose that we should just treat extended stats objects like other
> generic database objects --- which, in practice, means "dump if the
> containing schema is getting dumped". We don't exclude views or matviews
> on the basis of whether their underlying tables are going to be dumped,
> so why should stats objects operate differently from those?
I claim that views and matviews are indeed separate objects rather than
sub-objects belonging to a table; but a stats object is, in some way,
just part of the table(s) definition(s), so the analogy doesn't hold
necessarily. In particular the decision to dump the stats object should
be connected to the table(s) rather than the stat object as standing
alone.
In other words, I think the original policy of dumping a single-table
stats object whenever the table is dumped should be retained. I think
this should be a simple patch on top of what you committed.
The followup question is what to do for multi-table stats objects. I
think the options are that it should be dumped if:
1. any of its tables are dumped
2. *all* its tables are dumped
3. the schema containing the stats object is dumped
I think #2 is a loser: the object goes missing if you dump the tables
individually.
#3 has a similar problem: if you put the stats object in a schema that's
not that of any of the involved tables (admittedly a somewhat odd
choice), then again the stats object goes missing, and there is no
indication that the dump is incomplete except that the application
becomes slower -- a situation difficult to detect (unlike missing
regular objects, because then queries break in obvious ways).
My preference is for #1. It has the small problem that if you have one
dump for each table, the stats object appears more than once (so
restoring both causes an error during restore of the second one), but I
don't think this is terrible.
I think you could go even further and say (combination of #1 and #3):
4. dump the stats object if any of the involved tables are dumped, *or*
the schema containing the stats object is dumped.
with the rationale that if you explicitly dump a schema in which you've
specifically put a number of stat objects, then they should appear in
the dump, even if the tables are not dumped.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2018-02-12 16:14:04 | Re: Bogosities in pg_dump's extended statistics support |
Previous Message | Pavel Stehule | 2018-02-12 14:39:31 | Re: csv format for psql |