From: | Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> |
---|---|
To: | Richard Guo <guofenglinux(at)gmail(dot)com> |
Cc: | Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru> |
Subject: | Re: POC: GROUP BY optimization |
Date: | 2024-02-22 07:04:55 |
Message-ID: | d026580a-15a0-4734-9e69-cc6ebb70da1d@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 22/2/2024 13:35, Richard Guo wrote:
> The avg() function on integer argument is commonly used in
> aggregates.sql. I don't think this is an issue. See the first test
> query in aggregates.sql.
Make sense
> > it should be parallel to the test cases for utilize the ordering of
> > index scan and subquery scan.
>
> Also, I'm unsure about removing the disabling of the
> max_parallel_workers_per_gather parameter. Have you discovered the
> domination of the current plan over the partial one? Do the cost
> fluctuations across platforms not trigger a parallel plan?
>
>
> The table used for testing contains only 100 tuples, which is the size
> of only one page. I don't believe it would trigger any parallel plans,
> unless we manually change min_parallel_table_scan_size.
I don't intend to argue it, but just for the information, I frequently
reduce it to zero, allowing PostgreSQL to make a decision based on
costs. It sometimes works much better, because one small table in multi
join can disallow an effective parallel plan.
>
> What's more, I suggest to address here the complaint from [1]. As I
> see,
> cost difference between Sort and IncrementalSort strategies in that
> case
> is around 0.5. To make the test more stable I propose to change it a
> bit
> and add a limit:
> SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10;
> It makes efficacy of IncrementalSort more obvious difference around 10
> cost points.
>
>
> I don't think that's necessary. With Incremental Sort the final cost
> is:
>
> GroupAggregate (cost=1.66..19.00 rows=100 width=25)
>
> while with full Sort it is:
>
> GroupAggregate (cost=16.96..19.46 rows=100 width=25)
>
> With the STD_FUZZ_FACTOR (1.01), there is no doubt that the first path
> is cheaper on total cost. Not to say that even if somehow we decide the
> two paths are fuzzily the same on total cost, the first path still
> dominates because its startup cost is much cheaper.
As before, I won't protest here - it needs some computations about how
much cost can be added by bulk extension of the relation blocks. If
Maxim will answer that it's enough to resolve his issue, why not?
--
regards,
Andrei Lepikhov
Postgres Professional
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2024-02-22 07:16:09 | Re: Speeding up COPY TO for uuids and arrays |
Previous Message | Michael Paquier | 2024-02-22 06:56:09 | Re: Add lookup table for replication slot invalidation causes |