From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | "Tomas Vondra" <tomas(dot)vondra(at)enterprisedb(dot)com>, "Tom Dunstan" <pgsql(at)tomd(dot)cc> |
Cc: | "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "jian he" <jian(dot)universality(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Do we want a hashset type? |
Date: | 2023-06-14 12:57:07 |
Message-ID: | 83627ce2-236b-4a68-ac05-7398d9ec701f@app.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jun 14, 2023, at 11:44, Tomas Vondra wrote:
>> Perspective from a potential user: I'm currently working on something
>> where an array-like structure with fast membership test performance
>> would be very useful. The main type of query is doing an =ANY(the set)
>> filter, where the set could contain anywhere from very few to thousands
>> of entries (ints in our case). So we'd want the same index usage as
>> =ANY(array) but would like faster row checking than we get with an array
>> when other indexes are used.
>>
>
> We kinda already do this since PG14 (commit 50e17ad281), actually. If
> the list is long enough (9 values or more), we'll build a hash table
> during query execution. So pretty much exactly what you're asking for.
Would it be feasible to teach the planner to utilize the internal hash table of
hashset directly? In the case of arrays, the hash table construction is an
ad hoc operation, whereas with hashset, the hash table already exists, which
could potentially lead to a faster execution.
Essentially, the aim would be to support:
=ANY(hashset)
Instead of the current:
=ANY(hashset_to_array(hashset))
Thoughts?
/Joel
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-06-14 13:06:44 | Re: ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1 |
Previous Message | Antonin Houska | 2023-06-14 12:36:54 | Shouldn't cost_append() also scale the partial path's cost? |