Re: HOT for PostgreSQL 8.3

From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Hannu Krosing" <hannu(at)skype(dot)net>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, mark(at)mark(dot)mielke(dot)cc, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, "Pavan Deolasee" <pavan(dot)deolasee(at)enterprisedb(dot)com>, "Nikhil S" <nikhil(dot)sontakke(at)enterprisedb(dot)com>
Subject: Re: HOT for PostgreSQL 8.3
Date: 2007-02-14 13:07:03
Message-ID: 2e78013d0702140507i4f782033he14cdaa80f697ba2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/14/07, Hannu Krosing <hannu(at)skype(dot)net> wrote:
>
>
> OTOH, for same page HOT tuples, we have the command and trx ids stored
> twice first as cmax,xmax of the old tuple and as cmin,xmin of the
> updated tuple. One of these could probably be used for in-page HOT tuple
> pointer.

I think we recently merged cmin/cmax into a single combo cid. So I don't
think we can use cmin for storing back pointer. Idea of using xmin seems
feasible though.

As we can't rely on get-xmax-from-next-in-chain behaviour for off-page
> tuples, we need to move the other way, that is storing a pointer to
> previous version and getting xmin from there. That means keeping a chain
> back pointers in xmin fields for all but the oldest tuple in HOT
> chain/cycle.

Why can't we store the real xmin/xmax at the end of the chain and store
forward pointers in the xmax ? I find the idea of having back pointers more
compelling though.

This requires a little shuffling around of xmin/xmax info when removing
> dead HOT chain members, but this would void the need to do any index
> changes during HOT updates. Avoiding all index updates is probably good,
> as it means we dont need to do any index page locking.

I agree. If we can design a solution that does not require any index
updates,
that would save us a lot. One problem with such a approach though is that we
might have to live with a dead tuple/stub until another update occurs and
the tuple/stub is reused.

So the proposed structure for HOT chain is like this:
>
> * Oldest tuple in chain has real xmin/xmax, this needs a flag to be
> recognisable, of we may just start transaction counter at 64K so that
> any xmin that fits in 2 bytes is immediately recognized as hot back
> pointer. Or start cmin at 1 and use cmin=0 as pointer flag.
>
> * All newer tuples have real xmax, and in-page pointer to previous
> version in place of xmin. the real xmin is xmax of the previous tuple.
> When previous tuple is removed, xmin is stored in the new oldest tuple.
>
> * Index can point to any tuple in HOT chain. Finding the tuple by index
>
> * To find a tuple, one errives to (possibly) center of HOT chain, and
> the tuple may be found either by following the in-page pointer stored in
> xmin field or ctid pointers.

Can we quickly figure out based on the given snapshot and the root tuple
xmin/xmax, whether to follow the forward or the backward pointer to arrive
at a visible tuple ?

* If the tuples ctid is pointing to off page, which means it must thus
> be the newest one in HOT chain and also end of it. The chain ends also
> when the tuple is inserted by an rollbacked transaction.
>
> * there is a special case when the index pointer points to a rollbacked
> tuple. In this case the tuple can't be removed, but should be reused on
> next update. But this is the case for any HOT rollbacks.
>
>
One problem is with aborted HOT-updates. According to this scheme, xmin
of the aborted version is stored in the previous version. What happens when
that gets updated again and committed ? If we follow the back pointer from
the aborted tuple to fetch the xmin, we would actually be fetching the txid
of the committed transaction which later updated the tuple. This would fool
us to incorrectly believe that the aborted tuple is LIVE.

May be we can set XMIN_INVALID for the next tuple in the chain when
a tuple being updated has t_ctid pointing to a tuple in the same page,
other than itself.

I am sure there are interesting interactions between vacuum-ing that needs
to be considered as well.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2007-02-14 13:20:27 Re: Writing triggers in C++
Previous Message Florian G. Pflug 2007-02-14 13:06:35 Re: Writing triggers in C++