Re: Do we want a hashset type?

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "jian he" <jian(dot)universality(at)gmail(dot)com>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>
Cc: "Tomas Vondra" <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Do we want a hashset type?
Date: 2023-06-12 15:43:28
Message-ID: 7ff2c47b-1b38-42f2-8484-fe3102fefd8c@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jun 10, 2023, at 17:46, Andrew Dunstan wrote:
> Maybe you can post a full patch as well as incremental?

Attached patch is based on tvondra's last commit 375b072.

> Stylistically I think you should reduce reliance on magic numbers (like 13). Probably need some #define's?

Great idea, fixed, I've added a HASHSET_STEP definition (set to the value 13).

On Sat, Jun 10, 2023, at 17:51, jian he wrote:
> int32 value = strtol(str, &endptr, 10);
> there is no int32 value range check?
> imitate src/backend/utils/adt/int.c. the following way is what I came up with.
>
>
> int64 value = strtol(str, &endptr, 10);
>
> if (errno == ERANGE || value < INT_MIN || value > INT_MAX)

Thanks, fixed like suggested, except I used PG_INT32_MIN and PG_INT32_MAX,
which explicitly represent the maximum value for a 32-bit integer,
regardless of the platform or C implementation.

> also it will infinity loop in hashset_in if supply the wrong value....
> example select '{1,2s}'::hashset;
> I need kill -9 to kill the process.

Thanks. I've added a new test, `sql/invalid.sql` with that example query.

Here is a summary of all other changes:

* README.md: Added sections Usage, Data types, Functions and Aggregate Functions

* Added test/ directory with some tests.

* Added "not production-ready" notice at top of README, warning for breaking
changes and no migration scripts, until our first release.

* Change version from 1.0.0 to 0.0.1 to indicate current status.

* Added CEIL_DIV macro

* Implemented hashset_in(), hashset_out()
The syntax for the serialized format is comma separated integer values
wrapped around curly braces, e.g '{1,2,3}'

* Implemented hashset_recv() to match the existing hashset_send()

* Removed/rewrote some tdigest related comments

/Joel

Attachment Content-Type Size
hashset-0.0.1-a8a282a.patch application/octet-stream 23.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2023-06-12 15:52:01 Re: RFC: Logging plan of the running query
Previous Message Alvaro Herrera 2023-06-12 15:37:07 Re: Add support for AT LOCAL