Re: Wrong results with grouping sets

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Wrong results with grouping sets
Date: 2024-07-15 02:45:16
Message-ID: CAMbWs49rYkurn5z3JnjwBAMQ=UitG3WJMtAvS4g9w30WCYoykw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

FWIW, in addition to fixing wrong result issues for queries with
grouping sets, the changes in 0001 also improve performance for
queries that have subqueries in the grouping expressions, because
different instances of the same subquery would need to be executed
only once. As a simple example, consider

create table t (a int, b int);
insert into t select i, i from generate_series(1,10000)i;
analyze t;

-- on patched
explain (analyze, costs off)
select (select t1.b from t t2 where a = t1.a) as s1,
(select t1.b from t t2 where a = t1.a) as s2,
(select t1.b from t t2 where a = t1.a) as s3
from t t1
group by a, s1;
QUERY PLAN
------------------------------------------------------------------------------------
Group (actual time=20475.028..20480.543 rows=10000 loops=1)
Group Key: t1.a, ((SubPlan 1))
-> Sort (actual time=20475.017..20475.821 rows=10000 loops=1)
Sort Key: t1.a, ((SubPlan 1))
Sort Method: quicksort Memory: 697kB
-> Seq Scan on t t1 (actual time=7.435..20468.599 rows=10000 loops=1)
SubPlan 1
-> Seq Scan on t t2 (actual time=1.022..2.045 rows=1
loops=10000)
Filter: (a = t1.a)
Rows Removed by Filter: 9999
Planning Time: 1.561 ms
Execution Time: 20481.933 ms
(12 rows)

-- on master
explain (analyze, costs off)
select (select t1.b from t t2 where a = t1.a) as s1,
(select t1.b from t t2 where a = t1.a) as s2,
(select t1.b from t t2 where a = t1.a) as s3
from t t1
group by a, s1;
QUERY PLAN
------------------------------------------------------------------------------------
Group (actual time=20779.318..62233.526 rows=10000 loops=1)
Group Key: t1.a, ((SubPlan 1))
-> Sort (actual time=20775.125..20777.936 rows=10000 loops=1)
Sort Key: t1.a, ((SubPlan 1))
Sort Method: quicksort Memory: 697kB
-> Seq Scan on t t1 (actual time=7.492..20770.060 rows=10000 loops=1)
SubPlan 1
-> Seq Scan on t t2 (actual time=1.037..2.075 rows=1
loops=10000)
Filter: (a = t1.a)
Rows Removed by Filter: 9999
SubPlan 2
-> Seq Scan on t t2_1 (actual time=1.037..2.071 rows=1 loops=10000)
Filter: (a = t1.a)
Rows Removed by Filter: 9999
SubPlan 3
-> Seq Scan on t t2_2 (actual time=1.037..2.071 rows=1 loops=10000)
Filter: (a = t1.a)
Rows Removed by Filter: 9999
Planning Time: 1.286 ms
Execution Time: 62235.753 ms
(20 rows)

We can see that with the 0001 patch, this query runs ~3 times faster,
which is no surprise because there are 3 instances of the same
subquery in the targetlist.

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2024-07-15 02:56:42 Re: Add a GUC check hook to ensure summarize_wal cannot be enabled when wal_level is minimal
Previous Message Peter Smith 2024-07-15 02:38:20 Re: Pgoutput not capturing the generated columns