Re: Do we want a hashset type?

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Joel Jacobson <joel(at)compiler(dot)org>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tom Dunstan <pgsql(at)tomd(dot)cc>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Do we want a hashset type?
Date: 2023-06-19 09:49:48
Message-ID: CACJufxHq_ZwBObSEL1wJrvDrLWcU1brsnw4+OQ+wkKqsZSBE9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 19, 2023 at 2:51 PM Joel Jacobson <joel(at)compiler(dot)org> wrote:
>
> On Mon, Jun 19, 2023, at 02:00, jian he wrote:
> > select hashset_contains('{1,2}'::int4hashset,NULL::int);
> > should return null?
>
> Hmm, that's a good philosophical question.
>
> I notice Tomas Vondra in the initial commit opted for allowing NULL
inputs,
> treating them as empty sets, e.g. in int4hashset_add() we create a
> new hashset if the first argument is NULL.
>
> I guess the easiest perhaps most consistent NULL-handling strategy
> would be to just mark all relevant functions STRICT except for the agg
ones
> since we probably want to allow skipping over rows with NULL values
> without the entire result becoming NULL.
>
> But if we're not just going the STRICT route, then I think it's a bit
more tricky,
> since you could argue the hashset_contains() example should return FALSE
> since the set doesn't contain the NULL value, but OTOH, since we don't
> store NULL values, we don't know if has ever been added, hence a NULL
> result would perhaps make more sense.
>
> I think I lean on thinking that if we want to be "NULL-friendly", like we
> currently are in hashset_add(), it would probably be most user-friendly
> to be consistent and let all functions return non-null return values in
> all cases where it is not unreasonable.
>
> Since we're essentially designing a set-theoretic system, I think we
should
> aim for the logical "soundness" property of it and think about how we can
> verify that it is.
>
> Thoughts?
>
> /Joel

hashset_to_array function should be strict?

I noticed hashset_symmetric_difference and hashset_difference handle null
in a different way, seems they should handle null in a consistent way?

select '{1,2,NULL}'::int[] operator (pg_catalog.@>) '{NULL}'::int[]; --false
select '{1,2,NULL}'::int[] operator (pg_catalog.&&) '{NULL}'::int[];
--false.
So similarly I guess hashset_contains should be false.
select hashset_contains('{1,2}'::int4hashset,NULL::int);

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema 2023-06-19 09:50:09 Re: Deleting prepared statements from libpq.
Previous Message Schoemans Maxime 2023-06-19 09:49:22 Re: Implement missing join selectivity estimation for range types