From: | "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Yet another fast GiST build |
Date: | 2020-09-09 16:50:44 |
Message-ID: | 8516B6A8-11B2-41A8-B7EB-D0C38ADD06C3@yandex-team.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> 9 сент. 2020 г., в 20:39, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> написал(а):
>
> On 09/09/2020 15:20, Darafei "Komяpa" Praliaskouski wrote:
>> On Wed, Sep 9, 2020 at 3:09 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>>> Come to think of it, the point z-order comparator could benefit a lot
>>> from key abbreviation, too. You could do the point -> zorder conversion
>>> in the abbreviation routine.
>> That's how it works in PostGIS, only that we moved to more
>> effecient Hilbert curve:
>> https://github.com/postgis/postgis/blob/54399b9f6b0f02e8db9444f9f042b8d4ca6d4fa4/postgis/lwgeom_btree.c#L171
>
> Thanks, that's interesting.
>
> I implemented the abbreviated keys for the point opclass, too, and noticed that the patch as it was never used it. I reworked the patch so that tuplesort_begin_index_gist() is responsible for looking up the sortsupport function, like tuplesort_begin_index_btree() does, and uses abbreviation when possible.
Wow, abbreviated sort made gist for points construction even 1.5x faster!
btw there is small typo in arg names in gist_bbox_zorder_cmp_abbrev(); z1,z2 -> a,b
> do we have regression test coverage for this?
Yes, sorting build for points is tested in point.sql, but with small dataset. index_including_gist.sql seems to be working with boxes, but triggers point paths too.
> , also on a SIZEOF_DATUM==4 system since the abbreviation works differently with that, and push if nothing new comes up. And clarify the documentation and/or comments that the sortsupport function sees "compressed" values.
>
> I wonder if we could use sorting to also speed up building tsvector indexes? The values stored there are bit signatures, what would be a good sort order for those?
We need an order so that nearby values have a lot of bits in common.
What is the length of this signature?
For each 4 bytes we can compute number of 1s in it's binary representation. Then z-order these dwords as values 0-32.
This will be very inefficient grouping, but it will tend to keep empty and dense 4-byte regions apart.
Thanks for working on this!
Best regards, Andrey Borodin.
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2020-09-09 17:00:15 | Re: Remove line length restriction in passwordFromFile() |
Previous Message | Tom Lane | 2020-09-09 16:48:17 | Re: Remove line length restriction in passwordFromFile() |