From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: GIN improvements part 1: additional information |
Date: | 2013-12-10 19:44:01 |
Message-ID: | CAPpHfdtRAxaq8mShtpd4mh5R0=5hP900mBJNU5TnyaeM44EEyA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Dec 10, 2013 at 12:26 AM, Heikki Linnakangas <
hlinnakangas(at)vmware(dot)com> wrote:
> On 12/09/2013 11:34 AM, Alexander Korotkov wrote:
>
>> On Mon, Dec 9, 2013 at 1:18 PM, Heikki Linnakangas
>> <hlinnakangas(at)vmware(dot)com>wrote:
>>
>> Even if we use varbyte encoding, I wonder if it would be better to treat
>>> block + offset number as a single 48-bit integer, rather than encode them
>>> separately. That would allow the delta of two items on the same page to
>>> be
>>> stored as a single byte, rather than two bytes. Naturally it would be a
>>> loss on other values, but would be nice to see some kind of an analysis
>>> on
>>> that. I suspect it might make the code simpler, too.
>>>
>>
>> Yeah, I had that idea, but I thought it's not a better option. Will try to
>> do some analysis.
>>
>
> The more I think about that, the more convinced I am that it's a good
> idea. I don't think it will ever compress worse than the current approach
> of treating block and offset numbers separately, and, although I haven't
> actually tested it, I doubt it's any slower. About the same amount of
> arithmetic is required in both versions.
>
> Attached is a version that does that. Plus some other minor cleanup.
>
> (we should still investigate using a completely different algorithm,
> though)
Yes, when I though about that, I didn't realize that we can reserve less
than 16 bits for offset number.
I rerun my benchmark and got following results:
event | period
-----------------------+-----------------
index_build | 00:01:46.39056
index_build_recovery | 00:00:05
index_update | 00:06:01.557708
index_update_recovery | 00:01:23
search_new | 00:24:05.600366
search_updated | 00:25:29.520642
(6 rows)
label | blocks_mark
----------------+-------------
search_new | 847509920
search_updated | 883789826
(2 rows)
label | size
---------------+-----------
new | 364560384
after_updates | 642736128
(2 rows)
Speed is same while index size is less. In previous format it was:
label | size
---------------+-----------
new | 419299328
after_updates | 715915264
(2 rows)
Good optimization, thanks. I'll try another datasets but I expect similar
results.
However, patch didn't apply to head. Corrected version is attached.
------
With best regards,
Alexander Korotkov.
Attachment | Content-Type | Size |
---|---|---|
gin-packed-postinglists-20.patch.gz | application/x-gzip | 28.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2013-12-10 19:49:12 | Re: ANALYZE sampling is too good |
Previous Message | Andres Freund | 2013-12-10 19:28:38 | Re: ANALYZE sampling is too good |