Re: BUG #18568: BUG: Result wrong when do group by on partition table!

From: Tender Wang <tndrwang(at)gmail(dot)com>
To: 1105066510(at)qq(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18568: BUG: Result wrong when do group by on partition table!
Date: 2024-08-06 08:42:53
Message-ID: CAHewXNnKLrZYG4iqaYw=uB3XWRrYRZHo7VtcMsbUEbdbajQg2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

PG Bug reporting form <noreply(at)postgresql(dot)org> 于2024年8月6日周二 15:01写道:

> The following bug has been logged on the website:
>
> Bug reference: 18568
> Logged by: Webbo Han
> Email address: 1105066510(at)qq(dot)com
> PostgreSQL version: 16.3
> Operating system: centos 7.6
> Description:
>
> First, we create one case-insensitive collation use ICU:
> ```sql
> CREATE COLLATION case_insensitive (
> provider = icu,
> locale = 'und-u-ks-level2',
> deterministic = false
> );
> ```
>
> Then, we create the partition table, meanwhile we set the collation of
> column c to `case_insensitive`,
> and set partkey's collation to 'C'.
> ```sql
> SET enable_partitionwise_aggregate TO true;
> SET enable_partitionwise_join TO true;
> SET max_parallel_workers_per_gather TO 0;
> SET enable_incremental_sort TO off;
> CREATE TABLE pagg_tab (c text collate case_insensitive) PARTITION
> BY LIST(c
> collate "C");
> CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('a',
> 'b',
> 'c', 'd');
> CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('e',
> 'f',
> 'A');
> CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('B',
> 'C',
> 'D', 'E');
> INSERT INTO pagg_tab SELECT substr('abcdeABCDE', (i % 10) +1 , 1)
> FROM
> generate_series(0, 2999) i;
> ANALYZE pagg_tab;
> ```
>
> We do group by on the table pagg_tab use `case_insensitive` collation, we
> hope group key is case-insensitive.
> but we find the execution result is not what we expected.
> ```shell
> postgres=# SELECT c collate case_insensitive, count(c) FROM
> pagg_tab GROUP
> BY c collate case_insensitive;
> c | count
> ---+-------
> A | 300
> e | 300
> E | 300
> D | 300
> C | 300
> B | 300
> d | 300
> c | 300
> b | 300
> a | 300
> (10 rows)
> ```
>
> The reason is the function group_by_has_partkey() do not check partkey's
> collation, that lead to explain error.
> ```shell
> postgres=# EXPLAIN SELECT c collate case_insensitive, count(c) FROM
> pagg_tab GROUP BY c collate case_insensitive ;
> QUERY PLAN
>
> --------------------------------------------------------------------------------------
> Append (cost=12.00..60.15 rows=10 width=10)
> -> HashAggregate (cost=12.00..12.02 rows=2 width=10)
> Group Key: pagg_tab.c
> -> Seq Scan on pagg_tab_p2 pagg_tab (cost=0.00..9.00
> rows=600
> width=2)
> -> HashAggregate (cost=24.00..24.04 rows=4 width=10)
> Group Key: pagg_tab_1.c
> -> Seq Scan on pagg_tab_p3 pagg_tab_1 (cost=0.00..18.00
> rows=1200 width=2)
> -> HashAggregate (cost=24.00..24.04 rows=4 width=10)
> Group Key: pagg_tab_2.c
> -> Seq Scan on pagg_tab_p1 pagg_tab_2 (cost=0.00..18.00
> rows=1200 width=2)
> (10 rows)
> ```
>
> So, group_by_has_partkey() need to verify if the partkey's collation
> matches
> the groupkey,
> meanwhile, if groupkey is RelabelType node and it's collation equal to
> partkey's, it should
> also set variable `found` to true.
>
>
Yeah, I can reproduce $subject on HEAD.
But I found this when debug into group_by_has_partkey(), as below:
call nodeToString(groupexprs):
VAR : varcollid 16384
call nodeToString(partexpr):
VAR: varcollid 16384

So the collid of partkey and groupexpr is same, so add check here may not
fix this issue.

I continue to find out why the collation id of partkey is 16384(e.g.
case_insensitive). The partkey expr info is
set in set_baserel_partition_key_exprs(), which it uses
partkey->parttypcoll[cnt] value not partkey->partcollation value.

And partkey->parttypcoll[cnt] is assigned from pg_attribute , which is the
column c meta data.
Should we use partkey->partcollation value? I try to fix that in the
attached patch. I add your case in the test, and I don't find
failed regress.

--
Tender Wang

Attachment Content-Type Size
v1-0001-Fix-collation-different-between-columan-collation.patch application/octet-stream 4.2 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2024-08-06 08:53:19 BUG #18570: Drop event trigger for DDL finishes successfully but trigger still executed on DDL
Previous Message PG Bug reporting form 2024-08-06 08:34:18 BUG #18569: Memory leak in Postgres Enterprise server