From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Wrong results with grouping sets |
Date: | 2024-07-15 02:45:16 |
Message-ID: | CAMbWs49rYkurn5z3JnjwBAMQ=UitG3WJMtAvS4g9w30WCYoykw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
FWIW, in addition to fixing wrong result issues for queries with
grouping sets, the changes in 0001 also improve performance for
queries that have subqueries in the grouping expressions, because
different instances of the same subquery would need to be executed
only once. As a simple example, consider
create table t (a int, b int);
insert into t select i, i from generate_series(1,10000)i;
analyze t;
-- on patched
explain (analyze, costs off)
select (select t1.b from t t2 where a = t1.a) as s1,
(select t1.b from t t2 where a = t1.a) as s2,
(select t1.b from t t2 where a = t1.a) as s3
from t t1
group by a, s1;
QUERY PLAN
------------------------------------------------------------------------------------
Group (actual time=20475.028..20480.543 rows=10000 loops=1)
Group Key: t1.a, ((SubPlan 1))
-> Sort (actual time=20475.017..20475.821 rows=10000 loops=1)
Sort Key: t1.a, ((SubPlan 1))
Sort Method: quicksort Memory: 697kB
-> Seq Scan on t t1 (actual time=7.435..20468.599 rows=10000 loops=1)
SubPlan 1
-> Seq Scan on t t2 (actual time=1.022..2.045 rows=1
loops=10000)
Filter: (a = t1.a)
Rows Removed by Filter: 9999
Planning Time: 1.561 ms
Execution Time: 20481.933 ms
(12 rows)
-- on master
explain (analyze, costs off)
select (select t1.b from t t2 where a = t1.a) as s1,
(select t1.b from t t2 where a = t1.a) as s2,
(select t1.b from t t2 where a = t1.a) as s3
from t t1
group by a, s1;
QUERY PLAN
------------------------------------------------------------------------------------
Group (actual time=20779.318..62233.526 rows=10000 loops=1)
Group Key: t1.a, ((SubPlan 1))
-> Sort (actual time=20775.125..20777.936 rows=10000 loops=1)
Sort Key: t1.a, ((SubPlan 1))
Sort Method: quicksort Memory: 697kB
-> Seq Scan on t t1 (actual time=7.492..20770.060 rows=10000 loops=1)
SubPlan 1
-> Seq Scan on t t2 (actual time=1.037..2.075 rows=1
loops=10000)
Filter: (a = t1.a)
Rows Removed by Filter: 9999
SubPlan 2
-> Seq Scan on t t2_1 (actual time=1.037..2.071 rows=1 loops=10000)
Filter: (a = t1.a)
Rows Removed by Filter: 9999
SubPlan 3
-> Seq Scan on t t2_2 (actual time=1.037..2.071 rows=1 loops=10000)
Filter: (a = t1.a)
Rows Removed by Filter: 9999
Planning Time: 1.286 ms
Execution Time: 62235.753 ms
(20 rows)
We can see that with the 0001 patch, this query runs ~3 times faster,
which is no surprise because there are 3 instances of the same
subquery in the targetlist.
Thanks
Richard
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2024-07-15 02:56:42 | Re: Add a GUC check hook to ensure summarize_wal cannot be enabled when wal_level is minimal |
Previous Message | Peter Smith | 2024-07-15 02:38:20 | Re: Pgoutput not capturing the generated columns |