From: | Yeb Havinga <yebhavinga(at)gmail(dot)com> |
---|---|
To: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fix for seg picksplit function |
Date: | 2010-11-05 15:53:04 |
Message-ID: | 4CD42860.4070706@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello Alexander,
Here follows a review of your patch.
> Hackers,
>
> Seg contrib module contains the same bug in picksplit function as
> cube contrib module.
Good catch! :-)
> Also, Guttman's split algorithm is not needed in unidimensional case,
> because sorting based algorithm is good in this case.
I had some doubts whether this is true in the general case, instead of
the given example. I increased the interval width in your example to
0.25*b instead of 0.00005*b, with the purpose to increase overlaps
between intervals. Though the performance gain was less, it was still
faster than Guttmans algorithm. To make things worse I also tested with
an interval with of 1*b, resulting in a lot of overlaps and compared
several overlap queries. The sorting algorithm was 25% to 40% faster on
searches. Index creation time with the sorting algorithm is also a
fraction of the original creation time.
Since this testing could be part of a review, I looked at the code as
well and listed myself as reviewer on the commitfest.
Comparing with gbt_num_picksplit reveals some differences with sort
array intialization and size, the former's sort array starts at index 1
(FirstOffsetNumber), your implementation starts at 0 for sorting and
hence the size of the sorting array can be one element less. I prefer
your way of sort array initialization; gbt_num_pickplits's use of
FirstOffsetNumber of the qsort array seems to mix a define from the
gist/btree namespace for no reason and might even lead to confusion.
The remaining part of the new picksplit function puts the segs into left
or right, I think the code is easier to understand if there was only one
for loop from i=1 to 1 < maxoff, for the current code I had to verify
that all sort array entries were really used with the two seperate loops
that also skipped the first value. I edited the code a bit, and also
used seg_union to initialize/palloc the datum values. Finally, waste and
firsttime variables were initialized but not used anymore, so removed.
Attached is a revised patch.
regards,
Yeb Havinga
PS: when comparing with gbt_num_picksplit, I noticed that that one does
not update v->spl_ldatum and spl_rdatum to the union datums, but
initializes these to 0 at the beginning and never seems to update them.
Not sure if this is a problem since the num_picksplit stuff seems to
work well.
Attachment | Content-Type | Size |
---|---|---|
seg_picksplit_fix-0.2.patch | text/x-patch | 5.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2010-11-05 16:03:38 | Re: ALTER OBJECT any_name SET SCHEMA name |
Previous Message | Tom Lane | 2010-11-05 15:44:14 | Re: ALTER TABLE ... IF EXISTS feature? |