Re: Indexes for hashes

From: julyanto SUTANDANG <julyanto(at)equnix(dot)co(dot)id>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Ivan Voras <ivoras(at)gmail(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Indexes for hashes
Date: 2016-06-17 04:09:02
Message-ID: CAGu3fESMbee4ZXDTKwWgN0U7j99rYy3f6vRcjp44Qr88Mp+=TA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

This way is doing faster using crc32(data) than hashtext since crc32 is
hardware accelerated in intel (and others perhaps)
this way (crc32) is no way the same as hash, much way faster than
others...

Regards,

On Fri, Jun 17, 2016 at 10:51 AM, Claudio Freire <klaussfreire(at)gmail(dot)com>
wrote:

> On Wed, Jun 15, 2016 at 6:34 AM, Ivan Voras <ivoras(at)gmail(dot)com> wrote:
> >
> > I have an application which stores a large amounts of hex-encoded hash
> > strings (nearly 100 GB of them), which means:
> >
> > The number of distinct characters (alphabet) is limited to 16
> > Each string is of the same length, 64 characters
> > The strings are essentially random
> >
> > Creating a B-Tree index on this results in the index size being larger
> than
> > the table itself, and there are disk space constraints.
> >
> > I've found the SP-GIST radix tree index, and thought it could be a good
> > match for the data because of the above constraints. An attempt to
> create it
> > (as in CREATE INDEX ON t USING spgist(field_name)) apparently takes more
> > than 12 hours (while a similar B-tree index takes a few hours at most),
> so
> > I've interrupted it because "it probably is not going to finish in a
> > reasonable time". Some slides I found on the spgist index allude that
> both
> > build time and size are not really suitable for this purpose.
>
>
> I've found that hash btree indexes tend to perform well in these
> situations:
>
> CREATE INDEX ON t USING btree (hashtext(fieldname));
>
> However, you'll have to modify your queries to query for both, the
> hashtext and the text itself:
>
> SELECT * FROM t WHERE hashtext(fieldname) = hashtext('blabla') AND
> fieldname = 'blabla';
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

--

Julyanto SUTANDANG

Equnix Business Solutions, PT
(An Open Source an Open Mind Company)

Pusat Niaga ITC Roxy Mas Blok C2/42. Jl. KH Hasyim Ashari 125, Jakarta
Pusat
T: +6221 22866662 F: +62216315281 M: +628164858028

Caution: The information enclosed in this email (and any attachments) may
be legally privileged and/or confidential and is intended only for the use
of the addressee(s). No addressee should forward, print, copy, or otherwise
reproduce this message in any manner that would allow it to be viewed by
any individual not originally listed as a recipient. If the reader of this
message is not the intended recipient, you are hereby notified that any
unauthorized disclosure, dissemination, distribution, copying or the taking
of any action in reliance on the information herein is strictly prohibited.
If you have received this communication in error, please immediately notify
the sender and delete this message.Unless it is made by the authorized
person, any views expressed in this message are those of the individual
sender and may not necessarily reflect the views of PT Equnix Business
Solutions.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Claudio Freire 2016-06-17 04:18:52 Re: Indexes for hashes
Previous Message Adam Brusselback 2016-06-17 04:07:17 Re: 9.6 query slower than 9.5.3