Re: Wrong results with grouping sets

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Wrong results with grouping sets
Date: 2024-08-06 08:17:12
Message-ID: CAMbWs4_fnO0orTMZG4DejaAozmm+mJneZxPbio8n4xxn8dHiyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 5, 2024 at 5:42 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> I found a bug in the v6 patch. The following query would trigger the
> Assert in make_restrictinfo that the given subexpression should not be
> an AND clause.
>
> select max(a) from t group by a > b and a = b having a > b and a = b;
>
> This is because the expression 'a > b and a = b' in the HAVING clause is
> replaced by a Var that references the GROUP RTE. When we preprocess the
> columns of the GROUP RTE, we do not know whether the grouped expression
> is a havingQual or not, so we do not perform make_ands_implicit for it.
> As a result, after we replace the group Var in the HAVING clause with
> the underlying grouping expression, we will have a havingQual that is an
> AND clause.
>
> As we know, in the planner we need to first preprocess all the columns
> of the GROUP RTE. We also need to replace any Vars in the targetlist
> and HAVING clause that reference the GROUP RTE with the underlying
> grouping expressions. To fix the mentioned issue, I choose the perform
> this replacement before we preprocess the targetlist and havingQual, so
> that the make_ands_implicit would be performed when we preprocess the
> havingQual.

I've realized that there is something wrong with this conclusion. If
we perform the replacement of GROUP Vars with the underlying grouping
expressions before we've done with expression preprocessing on
targetlist and havingQual, we may end up with failing to match the
expressions that are part of grouping items to lower target items.
Consider:

create table t (a int, b int);
insert into t values (1, 2);

select a < b and b < 3 from t group by rollup(a < b and b < 3)
having a < b and b < 3;

The expression preprocessing process would convert the HAVING clause
to implicit-AND format and thus it would fail to be matched to lower
target items.

Another example is:

create table t1 (a boolean);
insert into t1 values (true);

select not a from t1 group by rollup(not a) having not not a;

This HAVING clause 'not not a' would be reduced to 'a' and thus fail
to be matched to lower tlist.

I fixed this issue in v13 by performing the replacement of GROUP Vars
after we've done with expression preprocessing on targetlist and
havingQual. An ensuing effect of this approach is that a HAVING
clause may contain expressions that are not fully preprocessed if they
are part of grouping items. This is not an issue as long as the
clause remains in HAVING. But if the clause is moved or copied into
WHERE, we need to re-preprocess these expressions. Please see the
attached for the changes.

Thanks
Richard

Attachment Content-Type Size
v13-0001-Introduce-an-RTE-for-the-grouping-step.patch application/octet-stream 54.5 KB
v13-0002-Mark-expressions-nullable-by-grouping-sets.patch application/octet-stream 26.3 KB
v13-0003-Unwrap-a-PlaceHolderVar-when-safe.patch application/octet-stream 7.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2024-08-06 08:18:05 Re: [HACKERS] make async slave to wait for lsn to be replayed
Previous Message Zhijie Hou (Fujitsu) 2024-08-06 08:15:27 RE: Conflict detection and logging in logical replication