Re: HOT for PostgreSQL 8.3

From: Hannu Krosing <hannu(at)skype(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, mark(at)mark(dot)mielke(dot)cc, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Pavan Deolasee <pavan(dot)deolasee(at)enterprisedb(dot)com>, Nikhil S <nikhil(dot)sontakke(at)enterprisedb(dot)com>
Subject: Re: HOT for PostgreSQL 8.3
Date: 2007-02-14 09:13:14
Message-ID: 1171444394.3446.49.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ühel kenal päeval, T, 2007-02-13 kell 09:38, kirjutas Tom Lane:
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> > Hannu Krosing wrote:
> >> Are we actually doing that ? I.E are null bitmaps really allocated in 1
> >> byte steps nowadays ?
>
> > Yes.
>
> Not really; we still have to MAXALIGN at the end of the bitmap. The
> point is that you can get 8 bits in there before paying the first
> additional MAXALIGN increment.
>
> It's all moot anyway since 8 bits isn't enough for a pointer ...

With 8k pages and MAXALIGN=8 we just barely can, as with current page
structure (tuple headers together with data) the minimal tuple size for
that case is 24 for header + 8 for data = 32 bytes which means 256
tuples per page (minus page header) and so we can store tuple number in
8 bits.

OTOH, for same page HOT tuples, we have the command and trx ids stored
twice first as cmax,xmax of the old tuple and as cmin,xmin of the
updated tuple. One of these could probably be used for in-page HOT tuple
pointer.

As we can't rely on get-xmax-from-next-in-chain behaviour for off-page
tuples, we need to move the other way, that is storing a pointer to
previous version and getting xmin from there. That means keeping a chain
back pointers in xmin fields for all but the oldest tuple in HOT
chain/cycle.

This requires a little shuffling around of xmin/xmax info when removing
dead HOT chain members, but this would void the need to do any index
changes during HOT updates. Avoiding all index updates is probably good,
as it means we dont need to do any index page locking.

So the proposed structure for HOT chain is like this:

* Oldest tuple in chain has real xmin/xmax, this needs a flag to be
recognisable, of we may just start transaction counter at 64K so that
any xmin that fits in 2 bytes is immediately recognized as hot back
pointer. Or start cmin at 1 and use cmin=0 as pointer flag.

* All newer tuples have real xmax, and in-page pointer to previous
version in place of xmin. the real xmin is xmax of the previous tuple.
When previous tuple is removed, xmin is stored in the new oldest tuple.

* Index can point to any tuple in HOT chain. Finding the tuple by index

* To find a tuple, one errives to (possibly) center of HOT chain, and
the tuple may be found either by following the in-page pointer stored in
xmin field or ctid pointers.

* If the tuples ctid is pointing to off page, which means it must thus
be the newest one in HOT chain and also end of it. The chain ends also
when the tuple is inserted by an rollbacked transaction.

* there is a special case when the index pointer points to a rollbacked
tuple. In this case the tuple can't be removed, but should be reused on
next update. But this is the case for any HOT rollbacks.

The proposed structure should enables cyclic reuse of dead tuples as
they fall out of visibility window and avoids updating index pointers.

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zeugswetter Andreas ADI SD 2007-02-14 09:21:36 Re: Fixing insecure security definer functions
Previous Message Tom Lane 2007-02-14 07:27:32 Re: "anyelement2" pseudotype