Re: Proposal: In-Place upgrade concept

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: In-Place upgrade concept
Date: 2007-07-03 18:53:26
Message-ID: 12029.1183488806@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> On Tue, Jul 03, 2007 at 11:36:03AM -0400, Tom Lane wrote:
>> ... (Thought experiment: a page is read in during crash recovery
>> or PITR slave operation, and discovered to have the old format.)

> Hmm, actually, what's the problem with PITR restoring a page in the old
> format. As long as it's clear it's the old format it'll get fixed when
> the page is actually used.

Well, what I'm concerned about is something like a WAL record providing
a new-format tuple to be inserted into a page, and then you find that
the page contains old-format tuples.

[ thinks some more... ] Actually, so long as we are willing to posit that

1. You're only allowed to upgrade a DB that's been cleanly shut down
(no replay of old-format WAL logs allowed)

2. Page format conversion is WAL-logged as a complete page replacement

then AFAICS WAL-reading operations should never have to apply any
updates to an old-format page; the first touch of any old page in the
WAL sequence should be a page replacement that updates it to new format.
This is not different from the argument why full_page_writes ensures
recovery from write failures.

So in principle the page-conversion stuff should always operate in a
live transaction. (Which is good, because now that I think about it
we couldn't emit a WAL record for the page conversion in those other
contexts.) I still feel pretty twitchy about letting it do catalog
access, though, because it has to operate at such a low level of the
system. bufmgr.c has no business invoking anything that might do
catalog access. If nothing else there are deadlock issues.

On the whole I think we could define format conversions for user-defined
types as "not our problem". A new version of a UDT that has an
incompatible representation on disk can simply be treated as a new type
with a different OID, exactly as Zdenek was suggesting for index AMs.
To upgrade a database containing such a column, you install
"my_udt_old.so" that services the old representation, ALTER TYPE my_udt
RENAME TO my_udt_old, then install new type my_udt and start using that.
Anyway that seems good enough for version 1.0 --- I don't recall that
we've ever changed the on-disk representation of any contrib/ types,
so how important is this scenario in the real world?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2007-07-03 19:19:58 ACM Paper relevant to our buffer algorithm
Previous Message Richard Huxton 2007-07-03 18:45:33 Re: Proposal: In-Place upgrade concept