Re: Do we want a hashset type?

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Tomas Vondra" <tomas(dot)vondra(at)enterprisedb(dot)com>, "Tom Dunstan" <pgsql(at)tomd(dot)cc>
Cc: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "jian he" <jian(dot)universality(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Do we want a hashset type?
Date: 2023-06-14 12:57:07
Message-ID: 83627ce2-236b-4a68-ac05-7398d9ec701f@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 14, 2023, at 11:44, Tomas Vondra wrote:
>> Perspective from a potential user: I'm currently working on something
>> where an array-like structure with fast membership test performance
>> would be very useful. The main type of query is doing an =ANY(the set)
>> filter, where the set could contain anywhere from very few to thousands
>> of entries (ints in our case). So we'd want the same index usage as
>> =ANY(array) but would like faster row checking than we get with an array
>> when other indexes are used.
>>
>
> We kinda already do this since PG14 (commit 50e17ad281), actually. If
> the list is long enough (9 values or more), we'll build a hash table
> during query execution. So pretty much exactly what you're asking for.

Would it be feasible to teach the planner to utilize the internal hash table of
hashset directly? In the case of arrays, the hash table construction is an
ad hoc operation, whereas with hashset, the hash table already exists, which
could potentially lead to a faster execution.

Essentially, the aim would be to support:

=ANY(hashset)

Instead of the current:

=ANY(hashset_to_array(hashset))

Thoughts?

/Joel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-06-14 13:06:44 Re: ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1
Previous Message Antonin Houska 2023-06-14 12:36:54 Shouldn't cost_append() also scale the partial path's cost?