Re: Remove useless GROUP BY columns considering unique index

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Zhang Mingli <zmlpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, jian he <jian(dot)universality(at)gmail(dot)com>
Subject: Re: Remove useless GROUP BY columns considering unique index
Date: 2024-12-12 03:38:57
Message-ID: f358f934-44d6-4c17-83fe-d61c5c89e191@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/12/24 10:09, David Rowley wrote:
> On Mon, 2 Dec 2024 at 17:18, Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>> Patch 0002 looks helpful and performant. I propose to check 'relid > 0'
>> to avoid diving into 'foreach(lc, parse->rtable)' at all if nothing has
>> been found.
>
> I did end up adding another fast path there, but I felt like checking
> relid > 0 wasn't as good as it could be as that would have only
> short-circuited when we don't see any Vars of level 0 in the GROUP BY.
> It seemed cheap enough to short-circuit when none of the relations
> mentioned in the GROUP BY have multiple columns mentioned.
Your solution seems much better my proposal. Thanks to apply it!

> when how do you decide if the GROUP BY should become t1.a,t1.b or
> t2.x,t2.y? It's not clear to me that using t1's columns is always
> better than using t2's. I imagine using a mix is never better, but I'm
> unsure how you'd decide which ones to use.
Depends on how to calculate that 'better'. Right now, GROUP-BY employs
two strategies to reduce path cost: 1) ORDER-BY statement (avoid final
sorting); 2) To fit incoming subtree pathkeys (avoid grouping presorting).
My idea comes close with [1], where the cost depends on the estimated
number of groups in the first grouping column because cost_sort predicts
the number of comparison operator calls based on statistics. In this
case, the choice between (x,y) and (a,b) will depend on the ndistinct of
'x' and 'a'.
In general, it was the idea to debate, more for further development than
for the patch in this thread.

[1] Consider the number of columns in the sort cost model
https://www.postgresql.org/message-id/flat/8742aaa8-9519-4a1f-91bd-364aec65f5cf%40gmail.com

--
regards, Andrei Lepikhov

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-12-12 03:39:38 Re: Add Postgres module info
Previous Message Noah Misch 2024-12-12 03:34:14 Re: Fix early elog(FATAL)