Re: Indexes for hashes

From: Ivan Voras <ivoras(at)gmail(dot)com>
To: julyanto(at)equnix(dot)co(dot)id
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Indexes for hashes
Date: 2016-06-15 14:20:46
Message-ID: CAF-QHFW5uii7fw20fw5Q7t+vcYL1o48-_t=nKaN17ApCCmX6LA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

Just for testing... is there a fast (i.e. written in C) crc32 or a similar
small hash function for PostgreSQL?

On 15 June 2016 at 16:00, Ivan Voras <ivoras(at)gmail(dot)com> wrote:

> Hi,
>
> This idea is similar to the substring one, and while it does give
> excellent performance and small size, it requires application code
> modifications, so it's out.
>
>
> On 15 June 2016 at 15:58, julyanto SUTANDANG <julyanto(at)equnix(dot)co(dot)id>
> wrote:
>
>> Hi Ivan,
>>
>> How about using crc32 ? and then index the integer as the result of crc32
>> function? you can split the hash into 2 part and do crc32 2x ? and then
>> create composite index on both integer (the crc32 result)
>> instead of using 64 char, you only employ 2 integer as index key.
>>
>> Regards,
>>
>> Jul
>>
>> On Wed, Jun 15, 2016 at 8:54 PM, Ivan Voras <ivoras(at)gmail(dot)com> wrote:
>>
>>> Hi,
>>>
>>> I understand your idea, and have also been thinking about it. Basically,
>>> existing applications would need to be modified, however slightly, and that
>>> wouldn't be good.
>>>
>>>
>>>
>>>
>>> On 15 June 2016 at 15:38, hubert depesz lubaczewski <depesz(at)depesz(dot)com>
>>> wrote:
>>>
>>>> On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote:
>>>> > I have an application which stores a large amounts of hex-encoded hash
>>>> > strings (nearly 100 GB of them), which means:
>>>>
>>>> Why do you keep them hex encoded, and not use bytea?
>>>>
>>>> I made a sample table with 1 million rows, looking like this:
>>>>
>>>> Table "public.new"
>>>> Column | Type | Modifiers
>>>> ---------+-------+-----------
>>>> texthex | text |
>>>> a_bytea | bytea |
>>>>
>>>> values are like:
>>>>
>>>> $ select * from new limit 10;
>>>> texthex |
>>>> a_bytea
>>>>
>>>> ------------------------------------------------------------------+--------------------------------------------------------------------
>>>> c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f |
>>>> \xc968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f
>>>> 61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db |
>>>> \x61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db
>>>> 757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 |
>>>> \x757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033
>>>> fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 |
>>>> \xfba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15
>>>> ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a |
>>>> \xecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a
>>>> 11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea |
>>>> \x11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea
>>>> 5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 |
>>>> \x5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852
>>>> 2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c |
>>>> \x2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c
>>>> 2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 |
>>>> \x2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7
>>>> 2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa |
>>>> \x2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa
>>>> (10 rows)
>>>>
>>>> created two indexes:
>>>> create index i1 on new (texthex);
>>>> create index i2 on new (a_bytea);
>>>>
>>>> i1 is 91MB, and i2 is 56MB.
>>>>
>>>> Index creation was also much faster - best out of 3 runs for i1 was
>>>> 4928.982
>>>> ms, best out of 3 runs for i2 was 2047.648 ms
>>>>
>>>> Best regards,
>>>>
>>>> depesz
>>>>
>>>>
>>>
>>
>>
>> --
>>
>>
>> Julyanto SUTANDANG
>>
>> Equnix Business Solutions, PT
>> (An Open Source an Open Mind Company)
>>
>> Pusat Niaga ITC Roxy Mas Blok C2/42. Jl. KH Hasyim Ashari 125, Jakarta
>> Pusat
>> T: +6221 22866662 F: +62216315281 M: +628164858028
>>
>>
>> Caution: The information enclosed in this email (and any attachments) may
>> be legally privileged and/or confidential and is intended only for the use
>> of the addressee(s). No addressee should forward, print, copy, or otherwise
>> reproduce this message in any manner that would allow it to be viewed by
>> any individual not originally listed as a recipient. If the reader of this
>> message is not the intended recipient, you are hereby notified that any
>> unauthorized disclosure, dissemination, distribution, copying or the taking
>> of any action in reliance on the information herein is strictly prohibited.
>> If you have received this communication in error, please immediately notify
>> the sender and delete this message.Unless it is made by the authorized
>> person, any views expressed in this message are those of the individual
>> sender and may not necessarily reflect the views of PT Equnix Business
>> Solutions.
>>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Joshua D. Drake 2016-06-15 14:36:13 Re: Indexes for hashes
Previous Message Ivan Voras 2016-06-15 14:00:22 Re: Indexes for hashes