Re: POC: GROUP BY optimization

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Zhihong Yu <zyu(at)yugabyte(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>, Белялов Дамир Наилевич <d(dot)belyalov(at)postgrespro(dot)ru>
Subject: Re: POC: GROUP BY optimization
Date: 2023-07-20 11:46:19
Message-ID: 60610df1-c32f-ebdf-e58c-7a664431f452@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/20/23 08:37, Andrey Lepikhov wrote:
> On 3/10/2022 21:56, Tom Lane wrote:
>> Revert "Optimize order of GROUP BY keys".
>>
>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
>> several follow-on fixes.
>> ...
>> Since we're hard up against the release deadline for v15, let's
>> revert these changes for now.  We can always try again later.
>
> It may be time to restart the project. As a first step, I rebased the
> patch on the current master. It wasn't trivial because of some latest
> optimizations (a29eab, 1349d27 and 8d83a5d).
> Now, Let's repeat the review and rewrite the current path according to
> the reasons uttered in the revert commit.

I think the fundamental task is to make the costing more reliable, and
the commit message 443df6e2db points out a couple challenges in this
area. Not sure how feasible it is to address enough of them ...

1) procost = 1.0 - I guess we could make this more realistic by doing
some microbenchmarks and tuning the costs for the most expensive cases.

2) estimating quicksort comparisons - This relies on ndistinct
estimates, and I'm not sure how much more reliable we can make those.
Probably not much :-( Not sure what to do about this, the only thing I
can think of is to track "reliability" of the estimates and only do the
reordering if we have high confidence in the estimates. That means we'll
miss some optimization opportunities, but it should limit the risk.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message stephane tachoires 2023-07-20 11:56:33 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Previous Message Melih Mutlu 2023-07-20 11:42:29 Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication