Re: Indexes for hashes

From: Ivan Voras <ivoras(at)gmail(dot)com>
To: depesz(at)depesz(dot)com
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Indexes for hashes
Date: 2016-06-15 13:54:07
Message-ID: CAF-QHFXJasdP-jur7JqdoP-mvmVmFN1pVoRf04_0pM6poNrsog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

I understand your idea, and have also been thinking about it. Basically,
existing applications would need to be modified, however slightly, and that
wouldn't be good.

On 15 June 2016 at 15:38, hubert depesz lubaczewski <depesz(at)depesz(dot)com>
wrote:

> On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote:
> > I have an application which stores a large amounts of hex-encoded hash
> > strings (nearly 100 GB of them), which means:
>
> Why do you keep them hex encoded, and not use bytea?
>
> I made a sample table with 1 million rows, looking like this:
>
> Table "public.new"
> Column | Type | Modifiers
> ---------+-------+-----------
> texthex | text |
> a_bytea | bytea |
>
> values are like:
>
> $ select * from new limit 10;
> texthex |
> a_bytea
>
> ------------------------------------------------------------------+--------------------------------------------------------------------
> c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f |
> \xc968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f
> 61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db |
> \x61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db
> 757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 |
> \x757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033
> fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 |
> \xfba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15
> ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a |
> \xecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a
> 11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea |
> \x11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea
> 5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 |
> \x5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852
> 2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c |
> \x2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c
> 2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 |
> \x2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7
> 2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa |
> \x2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa
> (10 rows)
>
> created two indexes:
> create index i1 on new (texthex);
> create index i2 on new (a_bytea);
>
> i1 is 91MB, and i2 is 56MB.
>
> Index creation was also much faster - best out of 3 runs for i1 was
> 4928.982
> ms, best out of 3 runs for i2 was 2047.648 ms
>
> Best regards,
>
> depesz
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message julyanto SUTANDANG 2016-06-15 13:58:15 Re: Indexes for hashes
Previous Message hubert depesz lubaczewski 2016-06-15 13:38:33 Re: Indexes for hashes