From: | 狂奔的蜗牛 <1105066510(at)qq(dot)com> |
---|---|
To: | Tender Wang <tndrwang(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | 回复: BUG #18568: BUG: Result wrong when do group by on partition table! |
Date: | 2024-08-06 11:33:16 |
Message-ID: | tencent_30047C85C6C819CECD4318951F101077CA06@qq.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Partkey's collation stored in RelOptInfo->part_scheme, and I use it to fix the bug.
The attachment is my solution!
狂奔的蜗牛
1105066510(at)qq(dot)com
------------------ 原始邮件 ------------------
发件人: "Tender Wang" <tndrwang(at)gmail(dot)com>;
发送时间: 2024年8月6日(星期二) 下午4:42
收件人: "狂奔的蜗牛"<1105066510(at)qq(dot)com>;"pgsql-bugs"<pgsql-bugs(at)lists(dot)postgresql(dot)org>;
主题: Re: BUG #18568: BUG: Result wrong when do group by on partition table!
PG Bug reporting form <noreply(at)postgresql(dot)org> 于2024年8月6日周二 15:01写道:
The following bug has been logged on the website:
Bug reference: 18568
Logged by: Webbo Han
Email address: 1105066510(at)qq(dot)com
PostgreSQL version: 16.3
Operating system: centos 7.6
Description:
First, we create one case-insensitive collation use ICU:
```sql
CREATE COLLATION case_insensitive (
provider = icu,
locale = 'und-u-ks-level2',
deterministic = false
);
```
Then, we create the partition table, meanwhile we set the collation of
column c to `case_insensitive`,
and set partkey's collation to 'C'.
```sql
SET enable_partitionwise_aggregate TO true;
SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
SET enable_incremental_sort TO off;
CREATE TABLE pagg_tab (c text collate case_insensitive) PARTITION BY LIST(c
collate "C");
CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('a', 'b',
'c', 'd');
CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('e', 'f',
'A');
CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('B', 'C',
'D', 'E');
INSERT INTO pagg_tab SELECT substr('abcdeABCDE', (i % 10) +1 , 1) FROM
generate_series(0, 2999) i;
ANALYZE pagg_tab;
```
We do group by on the table pagg_tab use `case_insensitive` collation, we
hope group key is case-insensitive.
but we find the execution result is not what we expected.
```shell
postgres=# SELECT c collate case_insensitive, count(c) FROM pagg_tab GROUP
BY c collate case_insensitive;
c | count
---+-------
A | 300
e | 300
E | 300
D | 300
C | 300
B | 300
d | 300
c | 300
b | 300
a | 300
(10 rows)
```
The reason is the function group_by_has_partkey() do not check partkey's
collation, that lead to explain error.
```shell
postgres=# EXPLAIN SELECT c collate case_insensitive, count(c) FROM
pagg_tab GROUP BY c collate case_insensitive ;
QUERY PLAN
--------------------------------------------------------------------------------------
Append (cost=12.00..60.15 rows=10 width=10)
-> HashAggregate (cost=12.00..12.02 rows=2 width=10)
Group Key: pagg_tab.c
-> Seq Scan on pagg_tab_p2 pagg_tab (cost=0.00..9.00 rows=600
width=2)
-> HashAggregate (cost=24.00..24.04 rows=4 width=10)
Group Key: pagg_tab_1.c
-> Seq Scan on pagg_tab_p3 pagg_tab_1 (cost=0.00..18.00
rows=1200 width=2)
-> HashAggregate (cost=24.00..24.04 rows=4 width=10)
Group Key: pagg_tab_2.c
-> Seq Scan on pagg_tab_p1 pagg_tab_2 (cost=0.00..18.00
rows=1200 width=2)
(10 rows)
```
So, group_by_has_partkey() need to verify if the partkey's collation matches
the groupkey,
meanwhile, if groupkey is RelabelType node and it's collation equal to
partkey's, it should
also set variable `found` to true.
Yeah, I can reproduce $subject on HEAD.But I found this when debug into group_by_has_partkey(), as below:
call nodeToString(groupexprs):
VAR : varcollid 16384
call nodeToString(partexpr):
VAR: varcollid 16384
So the collid of partkey and groupexpr is same, so add check here may not fix this issue.
I continue to find out why the collation id of partkey is 16384(e.g. case_insensitive). The partkey expr info is
set in set_baserel_partition_key_exprs(), which it uses partkey->parttypcoll[cnt] value not partkey->partcollation value.
And partkey->parttypcoll[cnt] is assigned from pg_attribute , which is the column c meta data.
Should we use partkey->partcollation value? I try to fix that in the attached patch. I add your case in the test, and I don't find
failed regress.
--
Tender Wang
Attachment | Content-Type | Size |
---|---|---|
0001-fix-group_by_has_partkey-bug.patch | application/octet-stream | 1.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | 狂奔的蜗牛 | 2024-08-06 11:45:59 | 回复: BUG #18568: BUG: Result wrong when do group by on partition table! |
Previous Message | Daniel Gustafsson | 2024-08-06 11:15:12 | Re: BUG #18569: Memory leak in Postgres Enterprise server |