Re: Do we want a hashset type?

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Tom Dunstan" <pgsql(at)tomd(dot)cc>, "Tomas Vondra" <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "jian he" <jian(dot)universality(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Do we want a hashset type?
Date: 2023-06-14 07:56:21
Message-ID: dbdefbf5-0c38-4217-a5f4-0beabd53e46b@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 14, 2023, at 06:31, Tom Dunstan wrote:
> On Mon, 12 Jun 2023 at 22:37, Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>> Perhaps. So you're proposing to have this as a regular built-in type?
>> It's hard for me to judge how popular this feature would be, but I guess
>> people often use arrays while they actually want set semantics ...
>
> Perspective from a potential user: I'm currently working on something
> where an array-like structure with fast membership test performance
> would be very useful. The main type of query is doing an =ANY(the set)
> filter, where the set could contain anywhere from very few to thousands
> of entries (ints in our case). So we'd want the same index usage as
> =ANY(array) but would like faster row checking than we get with an
> array when other indexes are used.

Thanks for providing an interesting use-case.

If you would like to help, one thing that would be helpful,
would be a complete runnable sql script,
that demonstrates exactly the various array-based queries
you currently use, with random data that resembles
reality as closely as possible, i.e. the same number of rows
in the tables, and similar distribution of values etc.

This would be helpful in terms of documentation,
as I think it would be good to provide Usage examples
that are based on real-life scenarios.

It would also be helpful to create realistic benchmarks when
evaluating and optimising the performance.

> Our app runs connecting to either an embedded postgres database that we
> control or an external one controlled by customers - this is typically
> RDS or some other cloud vendor's DB. Having such a type as a separate
> extension would make it unusable for us until all relevant cloud
> vendors decided that it was popular enough to include - something that
> may never happen, or even if it did, now any time soon.

Good point.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-06-14 09:05:11 Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)
Previous Message Michael Paquier 2023-06-14 07:47:45 Re: Replace (GUC_UNIT_MEMORY | GUC_UNIT_TIME) with GUC_UNIT in guc.c