Re: Hash Indexes

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash Indexes
Date: 2016-09-16 06:35:53
Message-ID: CAA4eK1+JzHUi5o0vKBg_=DpmFMuy_1wDtZM2EfLoAKAxZ8gMcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 15, 2016 at 7:53 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On 2016-05-10 17:39:22 +0530, Amit Kapila wrote:
>> For making hash indexes usable in production systems, we need to improve
>> its concurrency and make them crash-safe by WAL logging them.
>
> One earlier question about this is whether that is actually a worthwhile
> goal. Are the speed and space benefits big enough in the general case?
>

I think there will surely by speed benefits w.r.t reads for larger
indexes. All our measurements till now have shown that there is a
benefit varying from 30~60% (for reads) with hash index as compare to
btree, and I think it could be even more if we further increase the
size of index. On space front, I have not done any detailed study, so
it is not right to conclude anything, but it appears to me that if the
index is on char/varchar column where size of key is 10 or 20 bytes,
hash indexes should be beneficial as they store just hash-key.

> Could those benefits not be achieved in a more maintainable manner by
> adding a layer that uses a btree over hash(columns), and adds
> appropriate rechecks after heap scans?
>

I don't think it can be faster for reads than using real hash index,
but surely one can have that as a workaround.

> Note that I'm not saying that hash indexes are not worthwhile, I'm just
> doubtful that question has been explored sufficiently.
>

I think theoretically hash indexes are faster than btree considering
logarithmic complexity (O(1) vs. O(logn)), also the results after
recent optimizations indicate that hash indexes are faster than btree
for equal to searches. I am not saying after the recent set of
patches proposed for hash indexes they will be better in all kind of
cases. It could be beneficial for cases where indexed columns are not
updated heavily.

I think one can definitely argue that we can some optimizations in
btree and make them equivalent or better than hash indexes, but I am
not sure if it is possible for all-kind of use-cases.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-09-16 06:42:40 Re: Write Ahead Logging for Hash Indexes
Previous Message Pavel Stehule 2016-09-16 06:28:06 Re: Tackling JsonPath support