From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | Simon Riggs <simon(at)2ndQuadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Hot Standby and VACUUM FULL |
Date: | 2010-02-01 20:20:39 |
Message-ID: | 16001.1265055639@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I wrote:
> The design I sketched doesn't require such an assumption anyway. Once
> the map file is written, the relocation is effective, commit or no.
> As long as we restrict relocations to maintenance operations such as
> VACUUM FULL, which have no transactionally significant results, this
> doesn't seem like a problem. What we do need is that after a CLUSTER
> or V.F., which is going to relocate not only the rel but its indexes,
> the relocations of the rel and its indexes have to all "commit"
> atomically. But saving up the transaction's map changes and applying
> them in one write takes care of that.
BTW, I noticed a couple of other issues that need to be dealt with to
make that safe. During CLUSTER/V.F. we typically try to update the
relation's relfilenode, relpages, reltuples, relfrozenxid (in
setNewRelfilenode) as well as its toastrelid (in swap_relation_files).
These are regular transactional updates to the pg_class tuple that will
fail to commit if the outer transaction rolls back. However:
* For a mapped relation, both the old and new relfilenode will be zero,
so it doesn't matter.
* Possibly losing the updates of relpages and reltuples is not critical.
* For relfrozenxid, we can simply force the new and old values to be the
same rather than hoping to advance the value, if we're dealing with a
mapped relation. Or just let it be; I think that losing an advance
of relfrozenxid wouldn't be critical either.
* We can not change the toast rel OID of a shared catalog -- there's no
way to propagate that into the other copies of pg_class. So we need to
rejigger the logic for heap rewriting a little bit. Toast rel swapping
has to be handled by swapping their relfilenodes not their OIDs. This
is no big deal as far as cluster.c itself is concerned, but the tricky
part is that when we write new toasted values into the new toast rel,
the TOAST pointers going into the new heap have to be written with the
original toast-table OID value not the one that the transient target
toast rel has got. This is doable but it would uglify the TOAST API a
bit I think. Another possibility is to treat the toast rel OID of a
catalog as something that can be supplied by the map file. Not sure
which way to jump.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Boley | 2010-02-01 20:23:21 | Re: plpython3 |
Previous Message | Robert Haas | 2010-02-01 20:13:43 | Re: plpython3 |