Quick Links

Re: Resurrecting per-page cleaner for btree

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	pgsql-hackers(at)postgresql(dot)org, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	pgsql-patches(at)postgresql(dot)org, teramoto(dot)junji(at)lab(dot)ntt(dot)co(dot)jp
Subject:	Re: Resurrecting per-page cleaner for btree
Date:	2006-07-25 19:37:58
Message-ID:	23887.1153856278@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> This is a revised patch originated by Junji TERAMOTO for HEAD.
> [BTree vacuum before page splitting]
> http://archives.postgresql.org/pgsql-patches/2006-01/msg00301.php
> I think we can resurrect his idea because we will scan btree pages
> at-atime now; the missing-restarting-point problem went away.

I've applied this but I'm now having some second thoughts about it,
because I'm seeing an actual *decrease* in pgbench numbers from the
immediately prior CVS HEAD code. Using
pgbench -i -s 10 bench
pgbench -c 10 -t 1000 bench (repeat this half a dozen times)
with fsync off but all other settings factory-stock, what I'm seeing
is that the first run looks really good but subsequent runs tail off in
spectacular fashion :-( Pre-patch there was only minor degradation in
successive runs.

What I think is happening is that because pgbench depends so heavily on
updating existing records, we get into a state where an index page is
about full and there's one dead tuple on it, and then for each insertion
we have

* check for uniqueness marks one more tuple dead (the
next-to-last version of the tuple)
* newly added code removes one tuple and does a write
* now there's enough room to insert one tuple
* lather, rinse, repeat, never splitting the page.

The problem is that we've traded splitting a page every few hundred
inserts for doing a PageIndexMultiDelete, and emitting an extra WAL
record, on *every* insert. This is not good.

Had you done any performance testing on this patch, and if so what
tests did you use? I'm a bit hesitant to try to fix it on the basis
of pgbench results alone.

One possible fix that comes to mind is to only perform the cleanup
if we are able to remove more than one dead tuple (perhaps about 10
would be good). Or do the deletion anyway, but then go ahead and
split the page unless X amount of space has been freed (where X is
more than just barely enough for the incoming tuple).

After all the thought we've put into this, it seems a shame to
just abandon it :-(. But it definitely needs more tweaking.

regards, tom lane

In response to

Resurrecting per-page cleaner for btree at 2006-07-13 01:49:33 from ITAGAKI Takahiro

Responses

Re: [HACKERS] Resurrecting per-page cleaner for btree at 2006-07-26 01:38:21 from Bruce Momjian
Re: Resurrecting per-page cleaner for btree at 2006-07-26 05:05:08 from ITAGAKI Takahiro
Re: [HACKERS] Resurrecting per-page cleaner for btree at 2006-08-10 02:00:02 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Joachim Wieland	2006-07-25 19:45:05	status of yet another timezone todo item
Previous Message	Dave Page	2006-07-25 19:21:44	Re: root/administartor user check option.

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Bruce Momjian	2006-07-25 21:47:06	Re: LDAP lookup of connection parameters
Previous Message	Tom Lane	2006-07-25 18:02:31	Re: Resurrecting per-page cleaner for btree