Re: Constant time insertion into highly non-unique

From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Constant time insertion into highly non-unique
Date: 2005-04-15 16:31:44
Message-ID: 20050415163144.GS58835@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 14, 2005 at 06:36:44PM +0100, Simon Riggs wrote:
> I'd suggest a move right probability of 97% (divide by 16) for itemsz >
> 16 bytes and 94% (divide by 32) when itemsz >= 128
>
> Though I think functional indexes are the way to go there.

Why?

On a somewhat different note, what about storing or caching info about
what page has free space on it? ISTM that will be much more performant
than any kind of search, though you still do need to be careful about
bulk-inserts. My thought is this:

Go to the last page that we knew to have free space on it (last_free_page)
If that page is now full, continue scanning right
If we hit the end of appropriate pages, do a page split
Finally, when vacuum frees space in the index, it needs to set
last_free_page to the left-most page with free space, so that all free
space will be used.

That would be the 'greediest' version of this algorithm possible. In
reality, it probably makes more sense to combine this with the current
behavior, so that you have a random chance of stopping your scan to the
right and just doing a page split wherever you are in the scan. This
would still be a win because most of the time you'd go straight to a
page with free space on it.

As for storing the extra info, there's 2 possibilities that come to
mind. You can either store it in the index itself, or you can cache it
in shared memory. The advantage of just caching it is that you don't
need to change the on-disk index format. You also wouldn't need to
update something on disk every time last_free_page changes. The
disadvantage is that it's probably more complex to code, and you
obviously lose last_free_page info on restart and for index values that
see fewer inserts as the info for those index values falls out of the
cache.

In either case, you only want to worry about doing this for index values
that span more than a few pages (maybe the magic 3 that means you've got
at least one index page that's nothing but values for this index). For
more unique indexes, the extra overhead doesn't make sense.

Can someone think of a way to check the performance of this idea without
actually coding it?
--
Jim C. Nasby, Database Consultant decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-04-15 16:54:24 Re: Why not cache stable functions?
Previous Message Jim C. Nasby 2005-04-15 15:52:36 Re: Interactive docs idea