From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WIP: Fast GiST index build |
Date: | 2011-08-11 07:35:46 |
Message-ID: | CAPpHfdsE-=Zv8fN1g3x1bAyUHZnn-iTRd6mO_e12xmr1WxhYWg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Aug 11, 2011 at 10:21 AM, Heikki Linnakangas <
heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Split of an internal node works like this:
>
> 1. Gather all the existing tuples on the page, plus the new tuple being
> inserted.
> 2. Call picksplit on the tuples, to divide them into pages
> 3. Go through all tuples on the buffer associated with the page, and divide
> them into buffers on the new pages. This is done by calling penalty function
> on each buffered tuple.
>
> I wonder if it would be better for index quality to pass the buffered
> tuples to picksplit in the 2nd step, so that they too can affect the split
> decision. Maybe it doesn't make much difference in practice..
I had this idea. But:
1) Buffer contain much more tuples than page plus new tuple.
2) Picksplit method can easily be quadratic for example.
Let's see the complexity of picksplit algorithms:
1) geometric datatypes (point, box etc) - O(n) (BTW, I have serious doubts
about it, i.e. O(n*log(n)) algorithm can be in times better in many cases)
2) pg_trgm and fts - O(n^2)
3) seg - O(n*log(n))
4) cube - O(n^2)
Thus, I believe such feature should be an optional. We can try it as add-on
patch.
------
With best regards,
Alexander Korotkov.
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2011-08-11 08:08:31 | Re: sha1, sha2 functions into core? |
Previous Message | Peter Eisentraut | 2011-08-11 07:16:42 | Re: XMLATTRIBUTES vs. values of type XML |