On Sat, Nov 02, 2019 at 11:35:09PM +0300, Павел Ерёмин wrote:
The proposed option is not much different from what it is now.
We are not trying to save some space - we will reuse the existing one. We
just work in 64 bit transaction counters. Correct me if I'm wrong - the
address of the next version of the line is stored in the 6 byte field
t_cid in the tuple header - which is not attached to the current page in
any way - and can be stored anywhere in the table. Nothing changes.--
I think you mean t_ctid, not t_cid (which is a 4-byte CommandId, not any
sort of item pointer).
I think this comment from htup_details.h explains the issue:
* ... Beware however that VACUUM might
* erase the pointed-to (newer) tuple before erasing the pointing (older)
* tuple. Hence, when following a t_ctid link, it is necessary to check
* to see if the referenced slot is empty or contains an unrelated tuple.
* Check that the referenced tuple has XMIN equal to the referencing tuple's
* XMAX to verify that it is actually the descendant version and not an
* unrelated tuple stored into a slot recently freed by VACUUM. If either
* check fails, one may assume that there is no live descendant version.
Now, imagine you have a tuple that gets updated repeatedly (say, 3x) and
each version gets to a different page. Say, pages #1, #2, #3. And then
VACUUM happens on some of the "middle" page (this may happen when trying
to fit new row onto a page to allow HOT, but it might happen even during
regular VACUUM).
So we started with 3 tuples on pages #1, #2, #3, but now we have this
#1 - tuple exists, points to tuple on page #2
#2 - tuple no longer exists, cleaned up by vacuum
#3 - tuple exists
The scheme you proposed requires existence of all the tuples in the
chain to determine visibility. When tuple #2 no longer exists, it's
impossible to decide whether tuple on page #1 is visible or not.
This also significantly increases the amount of random I/O, pretty much
by factor of 2, because whenever you look at a row, you also have to
look at the "next version" which may be on another page. That's pretty
bad, bot for I/O and cache hit ratio. I don't think that's a reasonable
trade-off (at least compared to simply making the XIDs 64bit).
regards
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services