From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | "Tomas Vondra" <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Do we want a hashset type? |
Date: | 2023-06-09 10:58:18 |
Message-ID: | f5a469c2-ceeb-4631-a133-f0895786d587@app.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jun 8, 2023, at 12:19, Tomas Vondra wrote:
> Would you be interested in helping with / working on some of that? I
> don't have immediate need for this stuff, so it's not very high on my
> TODO list.
Sure, I'm willing to help!
I've attached a patch that works on some of the items on your list,
including some additions to the README.md.
There were a bunch of places where `maxelements / 8` caused bugs,
that had to be changed to do proper integer ceiling division:
- values = (int32 *) (set->data + set->maxelements / 8);
+ values = (int32 *) (set->data + (set->maxelements + 7) / 8);
Side note: I wonder if it would be good to add CEIL_DIV and FLOOR_DIV macros
to the PostgreSQL source code in general, since it's easy to make this mistake,
and quite verbose/error-prone to write it out manually everywhere.
Such macros could simplify code in e.g. numeric.c.
> There's a bunch of stuff that needs to be improved to make this properly
> usable, like:
>
> 1) better hash table implementation
TODO
> 2) input/output functions
I've attempted to implement these.
I thought comma separated values wrapped around curly braces felt as the most natural format,
example:
SELECT '{1,2,3}'::hashset;
> 3) support for other types (now it only works with int32)
TODO
> 4) I wonder if this might be done as an array-like polymorphic type.
That would be nice!
I guess the work-around would be to store the actual value of non-int type
in a lookup table, and then hash the int-based primary key in such table.
Do you think later implementing polymorphic type support would
mean a more or less complete rewrite, or can we carry on with int32-support
and add it later on?
> 5) more efficient storage format, with versioning etc.
TODO
> 6) regression tests
I've added some regression tests.
> Right. IMHO the query language is a separate thing, you still need to
> evaluate the query somehow - which is where hashset applies.
Good point, I fully agree.
/Joel
Attachment | Content-Type | Size |
---|---|---|
hashset-1.0.0-joel-0001.patch | application/octet-stream | 20.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2023-06-09 11:25:32 | Re: Views no longer in rangeTabls? |
Previous Message | Tomas Vondra | 2023-06-09 10:44:46 | Re: index prefetching |