Re: Indexes for hashes

From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Ivan Voras <ivoras(at)gmail(dot)com>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Indexes for hashes
Date: 2016-06-15 13:38:33
Message-ID: 20160615133833.GA29449@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote:
> I have an application which stores a large amounts of hex-encoded hash
> strings (nearly 100 GB of them), which means:

Why do you keep them hex encoded, and not use bytea?

I made a sample table with 1 million rows, looking like this:

Table "public.new"
Column | Type | Modifiers
---------+-------+-----------
texthex | text |
a_bytea | bytea |

values are like:

$ select * from new limit 10;
texthex | a_bytea
------------------------------------------------------------------+--------------------------------------------------------------------
c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f | \xc968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f
61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db | \x61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db
757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 | \x757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033
fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 | \xfba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15
ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a | \xecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a
11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea | \x11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea
5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 | \x5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852
2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c | \x2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c
2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 | \x2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7
2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa | \x2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa
(10 rows)

created two indexes:
create index i1 on new (texthex);
create index i2 on new (a_bytea);

i1 is 91MB, and i2 is 56MB.

Index creation was also much faster - best out of 3 runs for i1 was 4928.982
ms, best out of 3 runs for i2 was 2047.648 ms

Best regards,

depesz

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ivan Voras 2016-06-15 13:54:07 Re: Indexes for hashes
Previous Message ktm@rice.edu 2016-06-15 13:16:14 Re: Indexes for hashes