Quick Links

Re: WIP: Fast GiST index build

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP: Fast GiST index build
Date:	2011-08-11 07:35:46
Message-ID:	CAPpHfdsE-=Zv8fN1g3x1bAyUHZnn-iTRd6mO_e12xmr1WxhYWg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Aug 11, 2011 at 10:21 AM, Heikki Linnakangas <
heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> Split of an internal node works like this:
>
> 1. Gather all the existing tuples on the page, plus the new tuple being
> inserted.
> 2. Call picksplit on the tuples, to divide them into pages
> 3. Go through all tuples on the buffer associated with the page, and divide
> them into buffers on the new pages. This is done by calling penalty function
> on each buffered tuple.
>
> I wonder if it would be better for index quality to pass the buffered
> tuples to picksplit in the 2nd step, so that they too can affect the split
> decision. Maybe it doesn't make much difference in practice..

I had this idea. But:
1) Buffer contain much more tuples than page plus new tuple.
2) Picksplit method can easily be quadratic for example.

Let's see the complexity of picksplit algorithms:
1) geometric datatypes (point, box etc) - O(n) (BTW, I have serious doubts
about it, i.e. O(n*log(n)) algorithm can be in times better in many cases)
2) pg_trgm and fts - O(n^2)
3) seg - O(n*log(n))
4) cube - O(n^2)

Thus, I believe such feature should be an optional. We can try it as add-on
patch.

------
With best regards,
Alexander Korotkov.

In response to

Re: WIP: Fast GiST index build at 2011-08-11 06:21:17 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Magnus Hagander	2011-08-11 08:08:31	Re: sha1, sha2 functions into core?
Previous Message	Peter Eisentraut	2011-08-11 07:16:42	Re: XMLATTRIBUTES vs. values of type XML