Re: Indexes for hashes

From: julyanto SUTANDANG <julyanto(at)equnix(dot)co(dot)id>
To: Ivan Voras <ivoras(at)gmail(dot)com>
Cc: depesz(at)depesz(dot)com, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Indexes for hashes
Date: 2016-06-15 13:58:15
Message-ID: CAGu3fEQTM+1jjk0K2Y4fvtZMC7VeNMoWmJdhBQZRjg_r3V2_4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Ivan,

How about using crc32 ? and then index the integer as the result of crc32
function? you can split the hash into 2 part and do crc32 2x ? and then
create composite index on both integer (the crc32 result)
instead of using 64 char, you only employ 2 integer as index key.

Regards,

Jul

On Wed, Jun 15, 2016 at 8:54 PM, Ivan Voras <ivoras(at)gmail(dot)com> wrote:

> Hi,
>
> I understand your idea, and have also been thinking about it. Basically,
> existing applications would need to be modified, however slightly, and that
> wouldn't be good.
>
>
>
>
> On 15 June 2016 at 15:38, hubert depesz lubaczewski <depesz(at)depesz(dot)com>
> wrote:
>
>> On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote:
>> > I have an application which stores a large amounts of hex-encoded hash
>> > strings (nearly 100 GB of them), which means:
>>
>> Why do you keep them hex encoded, and not use bytea?
>>
>> I made a sample table with 1 million rows, looking like this:
>>
>> Table "public.new"
>> Column | Type | Modifiers
>> ---------+-------+-----------
>> texthex | text |
>> a_bytea | bytea |
>>
>> values are like:
>>
>> $ select * from new limit 10;
>> texthex |
>> a_bytea
>>
>> ------------------------------------------------------------------+--------------------------------------------------------------------
>> c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f |
>> \xc968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f
>> 61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db |
>> \x61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db
>> 757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 |
>> \x757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033
>> fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 |
>> \xfba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15
>> ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a |
>> \xecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a
>> 11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea |
>> \x11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea
>> 5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 |
>> \x5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852
>> 2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c |
>> \x2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c
>> 2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 |
>> \x2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7
>> 2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa |
>> \x2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa
>> (10 rows)
>>
>> created two indexes:
>> create index i1 on new (texthex);
>> create index i2 on new (a_bytea);
>>
>> i1 is 91MB, and i2 is 56MB.
>>
>> Index creation was also much faster - best out of 3 runs for i1 was
>> 4928.982
>> ms, best out of 3 runs for i2 was 2047.648 ms
>>
>> Best regards,
>>
>> depesz
>>
>>
>

--

Julyanto SUTANDANG

Equnix Business Solutions, PT
(An Open Source an Open Mind Company)

Pusat Niaga ITC Roxy Mas Blok C2/42. Jl. KH Hasyim Ashari 125, Jakarta
Pusat
T: +6221 22866662 F: +62216315281 M: +628164858028

Caution: The information enclosed in this email (and any attachments) may
be legally privileged and/or confidential and is intended only for the use
of the addressee(s). No addressee should forward, print, copy, or otherwise
reproduce this message in any manner that would allow it to be viewed by
any individual not originally listed as a recipient. If the reader of this
message is not the intended recipient, you are hereby notified that any
unauthorized disclosure, dissemination, distribution, copying or the taking
of any action in reliance on the information herein is strictly prohibited.
If you have received this communication in error, please immediately notify
the sender and delete this message.Unless it is made by the authorized
person, any views expressed in this message are those of the individual
sender and may not necessarily reflect the views of PT Equnix Business
Solutions.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ivan Voras 2016-06-15 14:00:22 Re: Indexes for hashes
Previous Message Ivan Voras 2016-06-15 13:54:07 Re: Indexes for hashes