Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, David Rowley <dgrowleyml(at)gmail(dot)com>, pavelsivash(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS
Date: 2020-08-21 04:15:06
Message-ID: CAKU4AWqxP2ePgxf8-RNLXPQVOG7=CbTPbfCbtV1ebiHtMT3L6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Aug 21, 2020 at 11:50 AM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:

>
>
> On Fri, Aug 21, 2020 at 10:44 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> writes:
>> > On Fri, Aug 21, 2020 at 5:51 AM Andrew Gierth <
>> andrew(at)tao11(dot)riddles(dot)org(dot)uk>
>> > wrote:
>> >> Unless I'm missing something, it should be safe to reference output
>> >> columns that are not mentioned in any grouping set,
>>
>> > I think such columns usually are aggregation expr, If we want to push
>> down
>> > a qual which reference to an aggregation expr, we have to push down
>> > to having cause, However I am not sure such pushing down really helps.
>>
>> Well, they can either be aggregates, or functions of the grouping
>> columns. You're right that there's not much we can do (today) with
>> restrictions on aggregate outputs, but there can be value in pushing
>> down restrictions on the other sort.
>>
>> As an example, consider the regression database's tenk1 table, and
>> for argument's sake add
>>
>> regression=# create index on tenk1 (abs(hundred));
>> CREATE INDEX
>>
>> Then we can get
>>
>> regression=# explain select * from (select hundred, ten, abs(hundred) a,
>> count(*) c from tenk1 group by 1,2) ss where a = 42;
>> QUERY PLAN
>>
>>
>> ------------------------------------------------------------------------------------
>> HashAggregate (cost=225.98..227.18 rows=96 width=20)
>> Group Key: tenk1.hundred, tenk1.ten
>> -> Bitmap Heap Scan on tenk1 (cost=5.06..225.23 rows=100 width=8)
>> Recheck Cond: (abs(hundred) = 42)
>> -> Bitmap Index Scan on tenk1_abs_idx (cost=0.00..5.04
>> rows=100 width=0)
>> Index Cond: (abs(hundred) = 42)
>> (6 rows)
>>
>> which is a lot cheaper than the pure seqscan you get with no pushed-down
>> condition.
>>
>> One thing that I find curious is that if I alter this example to use
>> grouping sets, say
>>
>> regression=# explain select * from (select hundred, ten, abs(hundred) a,
>> count(*) c from tenk1 group by grouping sets (1,2)) ss where a = 42;
>> QUERY PLAN
>> -----------------------------------------------------------------
>> HashAggregate (cost=495.00..546.65 rows=2 width=20)
>> Hash Key: tenk1.hundred
>> Hash Key: tenk1.ten
>> Filter: (abs(tenk1.hundred) = 42)
>> -> Seq Scan on tenk1 (cost=0.00..445.00 rows=10000 width=8)
>> (5 rows)
>>
>> i.e. it's not seeing the abs() condition as pushable below the
>> aggregation. I'm not quite sure if that's a necessary restriction
>> or a missed optimization.
>>
>> regards, tom lane
>>
>
> Both of the queries can push down the qual "a = 42" to the
> subquery->havingQual
> since we have group-by clause, this method unify the process for
> aggregation call
> and non-aggregation expr. . so it become to
>
> select .. from (select .. from tenk1 group .. having (abs(hundred) = 2);
>
>
> later in the subquery_planner, we will try to pull the having clause to
> where clause.
> then the Q2 failed to do so.
>
> /*
> * In some cases we may want to transfer a HAVING clause into
> WHERE. We
> * cannot do so if the HAVING clause contains aggregates
> (obviously) or
> * volatile functions (since a HAVING clause is supposed to be
> executed
> * only once per group). We also can't do this if there are any
> nonempty
> * grouping sets; moving such a clause into WHERE would
> potentially change
> * the results, if any referenced column isn't present in all the
> grouping
> * sets. (If there are only empty grouping sets, then the HAVING
> clause
> * must be degenerate as discussed below.)
> */
>
> I'm still trying to understand the comment, though.
>
>
>
This should be a correct behavior, we should not push down in the Q2
case. Here is an example:

regression=# create table tgs(a int, b int);
CREATE TABLE
regression=# insert into tgs values(1, 1), (1, 2), (2, 2);
INSERT 0 3
regression=# select * from (select a, b, count(*) from tgs group by
grouping sets((a), (b))) t where b = 1;
a | b | count
---+---+-------
| 1 | 1
(1 row)

regression=# select * from (select a, b, count(*) from tgs group by
grouping sets((a), (b)) having b = 1) t;
a | b | count
---+---+-------
| 1 | 1
(1 row)

regression=# select * from (select a, b, count(*) from tgs where b = 1
group by grouping sets((a), (b)) ) t;
a | b | count
---+---+-------
1 | | 1
| 1 | 1
(2 rows)

At the same time, our optimizer is smart enough to handle the below case
(only 1 set in group sets, which equals
group by).

regression=# explain select * from (select a, b, count(*) from tgs group by
grouping sets((a, b)) ) t where b = 1;
QUERY PLAN
-----------------------------------------------------------------
GroupAggregate (cost=38.44..38.66 rows=11 width=16)
Group Key: tgs.a, tgs.b
-> Sort (cost=38.44..38.47 rows=11 width=8)
Sort Key: tgs.a
-> Seq Scan on tgs (cost=0.00..38.25 rows=11 width=8)
Filter: (b = 1)
(6 rows)

--
Best Regards
Andy Fan

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrew Gierth 2020-08-21 13:04:50 Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS
Previous Message Andy Fan 2020-08-21 03:50:10 Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS