Re: Perfomance of IN-clause with many elements and possible solutions

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Dmitry Lazurkin <dilaz03(at)gmail(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Perfomance of IN-clause with many elements and possible solutions
Date: 2017-08-01 20:17:13
Message-ID: CAMkU=1yD+5vLUyJuzML6DYvPKMrAxpD_eKsmL=2JqdC81uLejQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Aug 1, 2017 at 9:24 AM, Dmitry Lazurkin <dilaz03(at)gmail(dot)com> wrote:

> On 08/01/2017 07:13 PM, Jeff Janes wrote:
>
> I think that HashSet is a Java-specific term. It is just a hash table in
> which there is no data to store, just the key itself (and probably a cash
> of the hashcode of that key), correct?
>
>
> Yes. And in Java HashSet implemented on top of HashMap (:
>
> I think a more general solution would be to get the planner and executor
> to run the in-list query using the Hash Join, the same way it runs the
> in-VALUES one.
>
>
> Have additional plan nodes big overhead?
>

I don't think a new plan node will have meaningfully more overhead than new
code of equal generality and robustness which has just been bolted on to
the side of an existing node. I don't know how to test that, though. If
the existing hash code is slow, it would probably be better to spend time
improving it for everyone than coming up with yet another implementation.
I know the existing simplehash.h code as specialized for tuples in
src/backend/executor/execGrouping.c
is horribly inefficient with memory usage when the tuples are skinny, but
that shouldn't be a problem for the number of values likely to be present
in a hard-coded in-list.

>
>
> I was impressed at how well the JSON and hstore worked, you might want to
> look at how they do it. It is must be using an internal hash table of some
> sort.
>
> JSONB and HSTORE keep sorted pairs and use binary search.
>
Ah. that would be another way to approach it. How many types have btree
ordering functions without hashing functions, or the reverse?

Cheers,

Jeff

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Alex Samad 2017-08-01 23:27:27 Re: Question about loading up a table
Previous Message Merlin Moncure 2017-08-01 18:54:17 Re: Shared Constants in PLPGSQL