Re: Indexes for hashes

From: Ivan Voras <ivoras(at)gmail(dot)com>
To: julyanto(at)equnix(dot)co(dot)id
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Indexes for hashes
Date: 2016-06-15 14:00:22
Message-ID: CAF-QHFVk7nYMXwSUffGiWBtZ4+BaRhJks=XX4h6_SfStKVxLTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

This idea is similar to the substring one, and while it does give excellent
performance and small size, it requires application code modifications, so
it's out.

On 15 June 2016 at 15:58, julyanto SUTANDANG <julyanto(at)equnix(dot)co(dot)id> wrote:

> Hi Ivan,
>
> How about using crc32 ? and then index the integer as the result of crc32
> function? you can split the hash into 2 part and do crc32 2x ? and then
> create composite index on both integer (the crc32 result)
> instead of using 64 char, you only employ 2 integer as index key.
>
> Regards,
>
> Jul
>
> On Wed, Jun 15, 2016 at 8:54 PM, Ivan Voras <ivoras(at)gmail(dot)com> wrote:
>
>> Hi,
>>
>> I understand your idea, and have also been thinking about it. Basically,
>> existing applications would need to be modified, however slightly, and that
>> wouldn't be good.
>>
>>
>>
>>
>> On 15 June 2016 at 15:38, hubert depesz lubaczewski <depesz(at)depesz(dot)com>
>> wrote:
>>
>>> On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote:
>>> > I have an application which stores a large amounts of hex-encoded hash
>>> > strings (nearly 100 GB of them), which means:
>>>
>>> Why do you keep them hex encoded, and not use bytea?
>>>
>>> I made a sample table with 1 million rows, looking like this:
>>>
>>> Table "public.new"
>>> Column | Type | Modifiers
>>> ---------+-------+-----------
>>> texthex | text |
>>> a_bytea | bytea |
>>>
>>> values are like:
>>>
>>> $ select * from new limit 10;
>>> texthex |
>>> a_bytea
>>>
>>> ------------------------------------------------------------------+--------------------------------------------------------------------
>>> c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f |
>>> \xc968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f
>>> 61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db |
>>> \x61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db
>>> 757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 |
>>> \x757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033
>>> fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 |
>>> \xfba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15
>>> ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a |
>>> \xecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a
>>> 11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea |
>>> \x11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea
>>> 5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 |
>>> \x5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852
>>> 2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c |
>>> \x2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c
>>> 2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 |
>>> \x2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7
>>> 2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa |
>>> \x2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa
>>> (10 rows)
>>>
>>> created two indexes:
>>> create index i1 on new (texthex);
>>> create index i2 on new (a_bytea);
>>>
>>> i1 is 91MB, and i2 is 56MB.
>>>
>>> Index creation was also much faster - best out of 3 runs for i1 was
>>> 4928.982
>>> ms, best out of 3 runs for i2 was 2047.648 ms
>>>
>>> Best regards,
>>>
>>> depesz
>>>
>>>
>>
>
>
> --
>
>
> Julyanto SUTANDANG
>
> Equnix Business Solutions, PT
> (An Open Source an Open Mind Company)
>
> Pusat Niaga ITC Roxy Mas Blok C2/42. Jl. KH Hasyim Ashari 125, Jakarta
> Pusat
> T: +6221 22866662 F: +62216315281 M: +628164858028
>
>
> Caution: The information enclosed in this email (and any attachments) may
> be legally privileged and/or confidential and is intended only for the use
> of the addressee(s). No addressee should forward, print, copy, or otherwise
> reproduce this message in any manner that would allow it to be viewed by
> any individual not originally listed as a recipient. If the reader of this
> message is not the intended recipient, you are hereby notified that any
> unauthorized disclosure, dissemination, distribution, copying or the taking
> of any action in reliance on the information herein is strictly prohibited.
> If you have received this communication in error, please immediately notify
> the sender and delete this message.Unless it is made by the authorized
> person, any views expressed in this message are those of the individual
> sender and may not necessarily reflect the views of PT Equnix Business
> Solutions.
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ivan Voras 2016-06-15 14:20:46 Re: Indexes for hashes
Previous Message julyanto SUTANDANG 2016-06-15 13:58:15 Re: Indexes for hashes