Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, David Rowley <dgrowleyml(at)gmail(dot)com>, pavelsivash(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS
Date: 2020-08-21 03:50:10
Message-ID: CAKU4AWqbcQcFFTMWbUk73tDWryDuOw4H7m082FcvTfkqgcgn3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Aug 21, 2020 at 10:44 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> writes:
> > On Fri, Aug 21, 2020 at 5:51 AM Andrew Gierth <
> andrew(at)tao11(dot)riddles(dot)org(dot)uk>
> > wrote:
> >> Unless I'm missing something, it should be safe to reference output
> >> columns that are not mentioned in any grouping set,
>
> > I think such columns usually are aggregation expr, If we want to push
> down
> > a qual which reference to an aggregation expr, we have to push down
> > to having cause, However I am not sure such pushing down really helps.
>
> Well, they can either be aggregates, or functions of the grouping
> columns. You're right that there's not much we can do (today) with
> restrictions on aggregate outputs, but there can be value in pushing
> down restrictions on the other sort.
>
> As an example, consider the regression database's tenk1 table, and
> for argument's sake add
>
> regression=# create index on tenk1 (abs(hundred));
> CREATE INDEX
>
> Then we can get
>
> regression=# explain select * from (select hundred, ten, abs(hundred) a,
> count(*) c from tenk1 group by 1,2) ss where a = 42;
> QUERY PLAN
>
>
> ------------------------------------------------------------------------------------
> HashAggregate (cost=225.98..227.18 rows=96 width=20)
> Group Key: tenk1.hundred, tenk1.ten
> -> Bitmap Heap Scan on tenk1 (cost=5.06..225.23 rows=100 width=8)
> Recheck Cond: (abs(hundred) = 42)
> -> Bitmap Index Scan on tenk1_abs_idx (cost=0.00..5.04 rows=100
> width=0)
> Index Cond: (abs(hundred) = 42)
> (6 rows)
>
> which is a lot cheaper than the pure seqscan you get with no pushed-down
> condition.
>
> One thing that I find curious is that if I alter this example to use
> grouping sets, say
>
> regression=# explain select * from (select hundred, ten, abs(hundred) a,
> count(*) c from tenk1 group by grouping sets (1,2)) ss where a = 42;
> QUERY PLAN
> -----------------------------------------------------------------
> HashAggregate (cost=495.00..546.65 rows=2 width=20)
> Hash Key: tenk1.hundred
> Hash Key: tenk1.ten
> Filter: (abs(tenk1.hundred) = 42)
> -> Seq Scan on tenk1 (cost=0.00..445.00 rows=10000 width=8)
> (5 rows)
>
> i.e. it's not seeing the abs() condition as pushable below the
> aggregation. I'm not quite sure if that's a necessary restriction
> or a missed optimization.
>
> regards, tom lane
>

Both of the queries can push down the qual "a = 42" to the
subquery->havingQual
since we have group-by clause, this method unify the process for
aggregation call
and non-aggregation expr. . so it become to

select .. from (select .. from tenk1 group .. having (abs(hundred) = 2);

later in the subquery_planner, we will try to pull the having clause to
where clause.
then the Q2 failed to do so.

/*
* In some cases we may want to transfer a HAVING clause into
WHERE. We
* cannot do so if the HAVING clause contains aggregates
(obviously) or
* volatile functions (since a HAVING clause is supposed to be
executed
* only once per group). We also can't do this if there are any
nonempty
* grouping sets; moving such a clause into WHERE would potentially
change
* the results, if any referenced column isn't present in all the
grouping
* sets. (If there are only empty grouping sets, then the HAVING
clause
* must be degenerate as discussed below.)
*/

I'm still trying to understand the comment, though.

--
Best Regards
Andy Fan

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andy Fan 2020-08-21 04:15:06 Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS
Previous Message Tom Lane 2020-08-21 02:44:12 Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS