From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> |
Cc: | Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru> |
Subject: | Re: POC: GROUP BY optimization |
Date: | 2024-02-22 06:35:41 |
Message-ID: | CAMbWs4-wAwJgxAnUNjCz8Qgzgjwn9t_NN9=i46hVPdaOT6Jmmg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Feb 22, 2024 at 12:18 PM Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
wrote:
> On 22/2/2024 09:09, Richard Guo wrote:
> > I looked through the v2 patch and have two comments.
> >
> > * The test case under "Check we don't pick aggregate path key instead of
> > grouping path key" does not have EXPLAIN to show the plan. So how can
> > we ensure we do not mistakenly select the aggregate pathkeys instead of
> > the grouping pathkeys?
>
> I confirm it works correctly. I am not sure about the stability of the
> zeros number in the output on different platforms here:
> avg
> --------------------
> 4.0000000000000000
> 5.0000000000000000
> It was why I'd used the format() function before. So, may we elaborate
> on the test and restrict the output?
The avg() function on integer argument is commonly used in
aggregates.sql. I don't think this is an issue. See the first test
query in aggregates.sql.
SELECT avg(four) AS avg_1 FROM onek;
avg_1
--------------------
1.5000000000000000
(1 row)
> > * I don't think the test case introduced by e1b7fde418 is still needed,
> > because we already have one under "Utilize the ordering of merge join to
> > avoid a full Sort operation". This kind of test case is just to ensure
> > that we are able to utilize the ordering of the subplans underneath. So
> > it should be parallel to the test cases for utilize the ordering of
> > index scan and subquery scan.
>
> Also, I'm unsure about removing the disabling of the
> max_parallel_workers_per_gather parameter. Have you discovered the
> domination of the current plan over the partial one? Do the cost
> fluctuations across platforms not trigger a parallel plan?
The table used for testing contains only 100 tuples, which is the size
of only one page. I don't believe it would trigger any parallel plans,
unless we manually change min_parallel_table_scan_size.
> What's more, I suggest to address here the complaint from [1]. As I see,
> cost difference between Sort and IncrementalSort strategies in that case
> is around 0.5. To make the test more stable I propose to change it a bit
> and add a limit:
> SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10;
> It makes efficacy of IncrementalSort more obvious difference around 10
> cost points.
I don't think that's necessary. With Incremental Sort the final cost
is:
GroupAggregate (cost=1.66..19.00 rows=100 width=25)
while with full Sort it is:
GroupAggregate (cost=16.96..19.46 rows=100 width=25)
With the STD_FUZZ_FACTOR (1.01), there is no doubt that the first path
is cheaper on total cost. Not to say that even if somehow we decide the
two paths are fuzzily the same on total cost, the first path still
dominates because its startup cost is much cheaper.
Thanks
Richard
From | Date | Subject | |
---|---|---|---|
Next Message | Sutou Kouhei | 2024-02-22 06:40:55 | Re: Why is pq_begintypsend so slow? |
Previous Message | Peter Smith | 2024-02-22 06:30:08 | Re: Add lookup table for replication slot invalidation causes |