Re: Wrong results with grouping sets

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Wrong results with grouping sets
Date: 2024-08-02 09:45:35
Message-ID: CAMbWs4_2t2pqqCFdS3NYJLwMMkAzYQKBOhKweFt-wE3YOi7rGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've been looking at cases where there are grouping-set keys that
reduce to Consts, and I noticed a plan with v11 patch that is not very
great.

explain (verbose, costs off)
select 1 as one group by rollup(one) order by one nulls first;
QUERY PLAN
-------------------------------
Sort
Output: (1)
Sort Key: (1) NULLS FIRST
-> GroupAggregate
Output: (1)
Group Key: (1)
Group Key: ()
-> Sort
Output: (1)
Sort Key: (1)
-> Result
Output: 1
(12 rows)

The Sort operation below the Agg node is unnecessary because the
grouping key is actually a Const. This plan results from wrapping the
Const in a PlaceHolderVar to carry the nullingrel bit of the RTE_GROUP
RT index, as it can be nulled by the grouping step. Although we
remove this nullingrel bit when generating the groupClause pathkeys
since we know the groupClause is logically below the grouping step, we
do not unwrap the PlaceHolderVar.

This suggests that we might need a mechanism to unwrap PHVs when safe.
0003 includes a flag in PlaceHolderVar to indicate whether it is safe
to remove the PHV and use its contained expression instead when its
phnullingrels becomes empty. Currently it is set true only in cases
where the PHV is used to carry the nullingrel bit of the RTE_GROUP RT
index. With 0003 the plan above becomes more reasonable:

explain (verbose, costs off)
select 1 as one group by rollup(one) order by one nulls first;
QUERY PLAN
-----------------------------
Sort
Output: (1)
Sort Key: (1) NULLS FIRST
-> GroupAggregate
Output: (1)
Group Key: 1
Group Key: ()
-> Result
Output: 1
(9 rows)

This could potentially open up opportunities for optimization by
unwrapping PHVs in other cases. As an example, consider

explain (costs off)
select * from t t1 left join
lateral (select t1.a as x, * from t t2) s on true
where t1.a = s.a;
QUERY PLAN
----------------------------
Nested Loop
-> Seq Scan on t t1
-> Seq Scan on t t2
Filter: (t1.a = a)
(4 rows)

The target entry s.x is wrapped in a PHV that contains lateral
reference to t1, which forces us to resort to nestloop join. However,
since the left join has been reduced to an inner join, we should be
able to remove this PHV and use merge or hash joins instead. I did
not implement this optimization in 0003. It seems that it should be
addressed in a separate patch.

Thanks
Richard

Attachment Content-Type Size
v12-0001-Introduce-an-RTE-for-the-grouping-step.patch application/octet-stream 55.5 KB
v12-0002-Mark-expressions-nullable-by-grouping-sets.patch application/octet-stream 26.3 KB
v12-0003-Unwrap-a-PlaceHolderVar-when-safe.patch application/octet-stream 7.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2024-08-02 10:12:52 Re: Adding OLD/NEW support to RETURNING
Previous Message Ashutosh Bapat 2024-08-02 09:31:06 Re: Small refactoring around vacuum_open_relation