Re: Do we want a hashset type?

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Tom Dunstan <pgsql(at)tomd(dot)cc>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Joel Jacobson <joel(at)compiler(dot)org>, jian he <jian(dot)universality(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Do we want a hashset type?
Date: 2023-06-14 09:44:23
Message-ID: d52abbc7-f474-b8f4-0c8a-11bbb1bedb0e@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/14/23 06:31, Tom Dunstan wrote:
> On Mon, 12 Jun 2023 at 22:37, Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com <mailto:tomas(dot)vondra(at)enterprisedb(dot)com>>
> wrote:
>
> Perhaps. So you're proposing to have this as a regular built-in type?
> It's hard for me to judge how popular this feature would be, but I guess
> people often use arrays while they actually want set semantics ...
>
>
> Perspective from a potential user: I'm currently working on something
> where an array-like structure with fast membership test performance
> would be very useful. The main type of query is doing an =ANY(the set)
> filter, where the set could contain anywhere from very few to thousands
> of entries (ints in our case). So we'd want the same index usage as
> =ANY(array) but would like faster row checking than we get with an array
> when other indexes are used.
>

We kinda already do this since PG14 (commit 50e17ad281), actually. If
the list is long enough (9 values or more), we'll build a hash table
during query execution. So pretty much exactly what you're asking for.

> Our app runs connecting to either an embedded postgres database that we
> control or an external one controlled by customers - this is typically
> RDS or some other cloud vendor's DB. Having such a type as a separate
> extension would make it unusable for us until all relevant cloud vendors
> decided that it was popular enough to include - something that may never
> happen, or even if it did, now any time soon.
>

Understood, but that's really a problem / choice of the cloud vendors.

The thing is, adding stuff to core is not free - it means the community
becomes responsible for maintenance, testing, fixing issues, etc. It's
not feasible (or desirable) to have all extensions in core, and cloud
vendors generally do have ways to support some pre-vetted extensions
that they deem useful enough. Granted, it means vetting/maintenance for
them, but that's kinda the point of managed services. And it'd not be
free for us either.

Anyway, that's mostly irrelevant, as PG14 already does the hash table
for this kind of queries. And I'm not strictly against adding some of
this into core, if it ends up being useful enough.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2023-06-14 09:50:41 Re: Avoid unncessary always true test (src/backend/storage/buffer/bufmgr.c)
Previous Message Dagfinn Ilmari Mannsåker 2023-06-14 09:30:23 Re: [PATCH] Using named captures in Catalog::ParseHeader()