From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | "jian he" <jian(dot)universality(at)gmail(dot)com> |
Cc: | "Tomas Vondra" <tomas(dot)vondra(at)enterprisedb(dot)com>, "Tom Dunstan" <pgsql(at)tomd(dot)cc>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Do we want a hashset type? |
Date: | 2023-06-15 21:05:38 |
Message-ID: | 34881a04-d5ee-4b63-9548-374b350c87d6@app.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jun 15, 2023, at 06:29, jian he wrote:
> I am not sure the following results are correct.
> with cte as (
> select hashset(x) as x
> ,hashset_capacity(hashset(x))
> ,hashset_count(hashset(x))
> from generate_series(1,10) g(x))
> select *
> ,'|' as delim
> , hashset_add(x,11111::int)
> ,hashset_capacity(hashset_add(x,11111::int))
> ,hashset_count(hashset_add(x,11111::int))
> from cte \gx
>
>
> results:
> -[ RECORD 1 ]----+-----------------------------
> x | {8,1,10,3,9,4,6,2,11111,5,7}
> hashset_capacity | 64
> hashset_count | 10
> delim | |
> hashset_add | {8,1,10,3,9,4,6,2,11111,5,7}
> hashset_capacity | 64
> hashset_count | 11
Nice catch, you found a bug!
Fixed in attached patch:
---
Ensure hashset_add and hashset_merge operate on copied data
Previously, the hashset_add() and hashset_merge() functions were
modifying the original hashset in-place. This was leading to unexpected
results because the original data in the hashset was being altered.
This commit introduces the macro PG_GETARG_INT4HASHSET_COPY(), ensuring
a copy of the hashset is created and modified, leaving the original
hashset untouched.
This adjustment ensures hashset_add() and hashset_merge() operate
correctly on the copied hashset and prevent modification of the
original data.
A new regression test file `reported_bugs.sql` has been added to
validate the proper functionality of these changes. Future reported
bugs and their corresponding tests will also be added to this file.
---
I wonder if this function:
static int4hashset_t *
int4hashset_copy(int4hashset_t *src)
{
return src;
}
...that was previously named hashset_copy(),
should be implemented to actually copy the struct,
instead of just returning the input?
It is being used by int4hashset_agg_combine() like this:
/* copy the hashset into the right long-lived memory context */
oldcontext = MemoryContextSwitchTo(aggcontext);
src = int4hashset_copy(src);
MemoryContextSwitchTo(oldcontext);
/Joel
Attachment | Content-Type | Size |
---|---|---|
hashset-0.0.1-da84659.patch | application/octet-stream | 4.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2023-06-15 22:15:36 | Re: subscription/033_run_as_table_owner is not listed in the meson.build |
Previous Message | Tom Lane | 2023-06-15 20:36:33 | Re: Memory leak in incremental sort re-scan |