Quick Links

Re: Indexes for hashes

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Ivan Voras <ivoras(at)gmail(dot)com>
Cc:	postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Indexes for hashes
Date:	2016-06-17 03:51:03
Message-ID:	CAGTBQpY7apkp79d2a+mgz-o0MggLrY-nGaMFZBjvGCuvyAA75A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Wed, Jun 15, 2016 at 6:34 AM, Ivan Voras <ivoras(at)gmail(dot)com> wrote:
>
> I have an application which stores a large amounts of hex-encoded hash
> strings (nearly 100 GB of them), which means:
>
> The number of distinct characters (alphabet) is limited to 16
> Each string is of the same length, 64 characters
> The strings are essentially random
>
> Creating a B-Tree index on this results in the index size being larger than
> the table itself, and there are disk space constraints.
>
> I've found the SP-GIST radix tree index, and thought it could be a good
> match for the data because of the above constraints. An attempt to create it
> (as in CREATE INDEX ON t USING spgist(field_name)) apparently takes more
> than 12 hours (while a similar B-tree index takes a few hours at most), so
> I've interrupted it because "it probably is not going to finish in a
> reasonable time". Some slides I found on the spgist index allude that both
> build time and size are not really suitable for this purpose.

I've found that hash btree indexes tend to perform well in these situations:

CREATE INDEX ON t USING btree (hashtext(fieldname));

However, you'll have to modify your queries to query for both, the
hashtext and the text itself:

SELECT * FROM t WHERE hashtext(fieldname) = hashtext('blabla') AND
fieldname = 'blabla';

In response to

Indexes for hashes at 2016-06-15 09:34:18 from Ivan Voras

Responses

Re: Indexes for hashes at 2016-06-17 04:09:02 from julyanto SUTANDANG

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Tom Lane	2016-06-17 03:57:32	Re: 9.6 query slower than 9.5.3
Previous Message	Adam Brusselback	2016-06-17 03:36:22	Re: 9.6 query slower than 9.5.3