From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Alexander Kukushkin <cyberdemn(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Speeding up pg_upgrade |
Date: | 2017-12-07 18:32:43 |
Message-ID: | 20171207183243.GF4628@tamriel.snowman.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Robert, all,
* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Thu, Dec 7, 2017 at 11:42 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> >> It seems pretty clear to me that we should somehow transfer stats from
> >> the old server to the new one. Shouldn't it just be a matter of
> >> serializing the MCV/histogram/ndistinct values, then have capabilities
> >> to load on the new server?
> >
> > The reason pg_upgrade hasn't done that in the past is not wishing to
> > assume that the new version does stats identically to the old version.
> > Since we do in fact add stats or change stuff around from time to time,
> > that's not a negligible consideration.
>
> Yes, but we don't do that for every release. We could put rules into
> pg_upgrade about which releases changed the stats format incompatibly,
> and not transfer the stats when crossing between two releases with
> incompatible formats. That's more than zero effort, of course, but it
> might be worth it. We've already got CATALOG_VERSION_NO,
> XLOG_PAGE_MAGIC, PG_CONTROL_VERSION, PG_PROTOCOL_LATEST,
> BTREE_VERSION, HASH_VERSION, BRIN_CURRENT_VERSION,
> GIN_CURRENT_VERSION, LOGICALREP_PROTO_VERSION_NUM,
> PG_PAGE_LAYOUT_VERSION, PG_DATA_CHECKSUM_VERSION, K_VERS_MAJOR,
> K_VERS_MINOR, K_VERS_REV, and the utterly unused MIGRATOR_API_VERSION.
If we go down that route, since this makes a pretty serious difference
in terms of what the user has to deal with post-pg_upgrade, I'd suggest
we require an additional option for the user to pass when stats aren't
going to be migrated, so they are aware of that.
The concern I have hear is that we end up changing things in v13 and
suddenly everyone has to re-analyze but they didn't to go from 10->11 or
11->12 and they'll get caught off-guard by it.
Of course, this might end up having an entirely different effect: it
might mean that we're suddenly a lot shier about changing the stats in a
backwards-incompatible way, just as we now are basically stuck with the
existing on-disk heap format..
> Now, I have to admit that I find the process of trying to remember to
> bump the correct set of version numbers in every commit just a tad
> frustrating; it adds a cognitive burden I'd just as well skip.
Agreed, would be great if we could improve on this.
> However, the failure to transfer stats over the years seems to have
> actually caused real problems for many users, so I think in this case
> we might be best off sucking it up and adding one more version number.
Yes, it's definitely been an issue for users.
> We might even want to make it a little more fine-grained and track it
> separately by data type, but I'm not sure if that's really worth it.
This would have the nice property that we could just re-analyze the data
types where things changed, something that's more likely to happen with
new data types than existing ones, I'd guess, and so that might be much
more reasonable for users.
Thanks!
Stephen
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2017-12-07 18:36:25 | Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com |
Previous Message | Robert Haas | 2017-12-07 18:22:23 | Re: Speeding up pg_upgrade |