Re: Do we want a hashset type?

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Joel Jacobson <joel(at)compiler(dot)org>, Tom Dunstan <pgsql(at)tomd(dot)cc>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Do we want a hashset type?
Date: 2023-06-26 02:04:05
Message-ID: CACJufxFDM6TiM3bHXeTwqvEAnMEaD9YxoAcOGjCrxg+=u=5Vdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 26, 2023 at 2:56 AM Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
wrote:
>
>
>
> On 6/25/23 15:32, jian he wrote:
> >> Or maybe I just don't understand the proposal. Perhaps it'd be best if
> >> jian wrote a patch illustrating the idea, and showing how it performs
> >> compared to the current approach.
> >
> > currently joel's idea is a int4hashset. based on the code first tomas
wrote.
> > it looks like a non-nested an collection of unique int4. external text
> > format looks like {int4, int4,int4}
> > structure looks like (header + capacity slots * int4).
> > Within the capacity slots, some slots are empty, some have unique
values.
> >
> > The textual int4hashset looks like a one dimensional array.
> > so I copied/imitated src/backend/utils/adt/arrayfuncs.c code, rewrote a
> > slight generic hashset input and output function.
> >
> > see the attached c file.
> > It works fine for non-null input output for {int4hashset, int8hashset,
> > timestamphashset,intervalhashset,uuidhashset).
>
> So how do you define a table with a "set" column? I mean, with the
> original patch we could have done
>
> CREATE TABLE (a int4hashset);
>
> and then store / query this. How do you do that with this approach?
>
> I've looked at the patch only very briefly - it's really difficult to
> grok such patches - large, with half the comments possibly obsolete etc.
> So what does reusing the array code give us, really?
>
> I'm not against reusing some of the array code, but arrays seem to be
> much more elaborate (multiple dimensions, ...) so the code needs to do
> significantly more stuff in various cases.
>
> When I previously suggested that maybe we should get "inspiration" from
> the array code, I was mostly talking about (a) type polymorphism, i.e.
> doing sets for arbitrary types, and (b) integrating this into grammar
> (instead of using functions).
>
> I don't see how copying arrayfuncs.c like this achieves either of these
> things. It still hardcodes just a handful of selected data types, and
> the array polymorphism relies on automatic creation of array type for
> every scalar type.
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

You are right.
I misread sql-createtype.html about type input_function that can take 3
arguments (cstring, oid, integer) part.
I thought while creating data types, I can pass different params to the
input_function.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2023-06-26 02:21:14 Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Previous Message Alena Rybakina 2023-06-26 01:47:43 Re: POC, WIP: OR-clause support for indexes