Re: Wrong results with grouping sets

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Wrong results with grouping sets
Date: 2024-07-16 09:40:57
Message-ID: CAExHW5uES7aeM=OOG9k9xHbVrdf1OQcTe96JOk3EX4zj1M0GPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 15, 2024 at 8:15 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:

>
> We can see that with the 0001 patch, this query runs ~3 times faster,
> which is no surprise because there are 3 instances of the same
> subquery in the targetlist.

I am not sure if that's the right thing to do.

I am using a slightly elaborate version of the tests in your patch
#select v, grouping(v) gv, grouping((select t1.v from gstest5 t2 where
id = t1.id)) gs,grouping((select t1.v from gstest5 t2 where id =
t1.id)) gs2,
(select t1.v from gstest5 t2 where id = t1.id) as s,
case when grouping(v) = 0
then v
else null end as cv,
case when grouping((select t1.v from gstest5 t2 where id = t1.id)) = 0
then (select t1.v from gstest5 t2 where id = t1.id)
else null end as cs
from gstest5 t1
group by grouping sets(v, s)
;
v | gv | gs | gs2 | s | cv | cs
---+----+----+-----+---+----+----
3 | 0 | 1 | 1 | | 3 |
5 | 0 | 1 | 1 | | 5 |
4 | 0 | 1 | 1 | | 4 |
2 | 0 | 1 | 1 | | 2 |
1 | 0 | 1 | 1 | | 1 |
| 1 | 0 | 0 | 2 | | 2
| 1 | 0 | 0 | 5 | | 5
| 1 | 0 | 0 | 4 | | 4
| 1 | 0 | 0 | 3 | | 3
| 1 | 0 | 0 | 1 | | 1
(10 rows)

#explain verbose select v, grouping(v) gv, grouping((select t1.v from
gstest5 t2 where id = t1.id)) gs,grouping((select t1.v from gstest5 t2
w
here id = t1.id)) gs2,
(select t1.v from gstest5 t2 where id = t1.id) as s,
case when grouping(v) = 0
then v
else null end as cv,
case when grouping((select t1.v from gstest5 t2 where id = t1.id)) = 0
then (select t1.v from gstest5 t2 where id = t1.id)
else null end as cs
from gstest5 t1
group by grouping sets(v, s)
;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------
HashAggregate (cost=18508.10..58790.10 rows=2460 width=28)
Output: t1.v, GROUPING(t1.v), GROUPING((SubPlan 2)),
GROUPING((SubPlan 3)), ((SubPlan 1)), CASE WHEN (GROUPING(t1.v) = 0)
THEN t1.v ELSE NULL::integer END
, CASE WHEN (GROUPING((SubPlan 4)) = 0) THEN ((SubPlan 1)) ELSE
NULL::integer END
Hash Key: t1.v
Hash Key: (SubPlan 1)
-> Seq Scan on pg_temp.gstest5 t1 (cost=0.00..18502.45 rows=2260 width=12)
Output: t1.v, (SubPlan 1), t1.id
SubPlan 1
-> Index Only Scan using gstest5_pkey on pg_temp.gstest5
t2 (cost=0.15..8.17 rows=1 width=4)
Output: t1.v
Index Cond: (t2.id = t1.id)

The result looks as expected but the plan isn't consistent with what
happens without grouping set
#select v,
(select t1.v from gstest5 t2 where id = t1.id) as s,
(select t1.v from gstest5 t2 where id = t1.id) as s2,
case when t1.v < 3
then (select t1.v from gstest5 t2 where id = t1.id)
else null end as cs
from gstest5 t1
order by case when t1.v < 3
then (select t1.v from gstest5 t2 where id = t1.id)
else null end
;
v | s | s2 | cs
---+---+----+----
1 | 1 | 1 | 1
2 | 2 | 2 | 2
3 | 3 | 3 |
4 | 4 | 4 |
5 | 5 | 5 |
(5 rows)

postgres(at)92841=#explain verbose select v,
(select t1.v from gstest5 t2 where id = t1.id) as s,
(select t1.v from gstest5 t2 where id = t1.id) as s2,
case when t1.v < 3
then (select t1.v from gstest5 t2 where id = t1.id)
else null end as cs
from gstest5 t1
order by case when t1.v < 3
then (select t1.v from gstest5 t2 where id = t1.id)
else null end
;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Sort (cost=55573.71..55579.36 rows=2260 width=16)
Output: t1.v, ((SubPlan 1)), ((SubPlan 2)), (CASE WHEN (t1.v < 3)
THEN (SubPlan 3) ELSE NULL::integer END)
Sort Key: (CASE WHEN (t1.v < 3) THEN (SubPlan 3) ELSE NULL::integer END)
-> Seq Scan on pg_temp.gstest5 t1 (cost=0.00..55447.80 rows=2260 width=16)
Output: t1.v, (SubPlan 1), (SubPlan 2), CASE WHEN (t1.v < 3)
THEN (SubPlan 3) ELSE NULL::integer END
SubPlan 1
-> Index Only Scan using gstest5_pkey on pg_temp.gstest5
t2 (cost=0.15..8.17 rows=1 width=4)
Output: t1.v
Index Cond: (t2.id = t1.id)
SubPlan 2
-> Index Only Scan using gstest5_pkey on pg_temp.gstest5
t2_1 (cost=0.15..8.17 rows=1 width=4)
Output: t1.v
Index Cond: (t2_1.id = t1.id)
SubPlan 3
-> Index Only Scan using gstest5_pkey on pg_temp.gstest5
t2_2 (cost=0.15..8.17 rows=1 width=4)
Output: t1.v
Index Cond: (t2_2.id = t1.id)
(17 rows)

Notice that every instance of that subquery has its own subplan in
this case. Why should the grouping set be different and have the same
subplan for two instances of the subquery? And if so, why not all of
the instances have the same subplan?

Since a subquery is a volatile expression, each of its instances
should be evaluated separately. If the expressions in ORDER BY,
GROUPING and GROUP BY are the same as an expression in the targetlist,
subqueries in those expressions won't need a subplan of their own. If
they are not part of targetlist, they will be added to the targetlist
as resjunk columns and thus form separate instances of subquery thus
adding more subplans.

--
Best Wishes,
Ashutosh Bapat

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-07-16 10:02:21 Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Previous Message Shlok Kyal 2024-07-16 09:04:12 Re: Pgoutput not capturing the generated columns