Re: Perfomance of IN-clause with many elements and possible solutions

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Dmitry Lazurkin <dilaz03(at)gmail(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Perfomance of IN-clause with many elements and possible solutions
Date: 2017-08-01 16:13:33
Message-ID: CAMkU=1w=B5WDufyjA_qYiRyHvnfOzVc09+=EEXyX1HRXPWDD=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Jul 31, 2017 at 12:29 PM, Dmitry Lazurkin <dilaz03(at)gmail(dot)com> wrote:

> On 31.07.2017 19:42, Jeff Janes wrote:
>
> I think it is simply because no one has gotten around to implementing it
> that way. When you can just write it as a values list instead, the
> incentive to make the regular in-list work better is not all that strong.
>
> Cheers,
>
> Jeff
>
>
> I see from explain that IN-clause uses just array with function ANY. I
> think for efficient implementation of this task I should implement new
> datatype "hashset". Am I wrong?
>

I think that HashSet is a Java-specific term. It is just a hash table in
which there is no data to store, just the key itself (and probably a cash
of the hashcode of that key), correct? PostgreSQL already has a template
for in-memory hash tables, src/include/lib/simplehash.h (and also one for
possibly-shared in-memory tables, src/backend/utils/hash/dynahash.c) , and
you should be able to specialize it for the case there there is no data
associated with the key. I think the harder part would be to get the
planner to use the hash table you implement. You would also have to
include code to fall back onto the array scanning for data types which do
not have a hash method defined.

I think a more general solution would be to get the planner and executor to
run the in-list query using the Hash Join, the same way it runs the
in-VALUES one.

I was impressed at how well the JSON and hstore worked, you might want to
look at how they do it. It is must be using an internal hash table of some
sort. But those only support strings as keys, while the in-list has to
support every data type, including user-defined-ones, so they have more
opportunities for optimization.

Cheers,

Jeff

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dan Cooperstock at Software4Nonprofits 2017-08-01 16:21:25 Problem compiling a C function on Windows - not finding _palloc0@4
Previous Message Tom Lane 2017-08-01 15:39:43 Re: [GENERAL] Not able to create collation on Windows