Re: Reordering DISTINCT keys to match input path's pathkeys

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Reordering DISTINCT keys to match input path's pathkeys
Date: 2024-11-14 01:09:32
Message-ID: CAMbWs4_mC81kpuHTABY2BQup+6ZFSYDPMD9RWykFuNA1WXBajw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 13, 2024 at 7:36 PM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> On 11/13/24 13:49, Richard Guo wrote:
> > Thanks for reviewing this patch. After some consideration, I think
> > it's not too complex to also apply this optimization to DISTINCT ON.
> > The parser already ensures that the DISTINCT ON expressions match the
> > initial ORDER BY expressions; we just need to ensure that the
> > resulting pathkey list from the reordering matches the original
> > distinctClause pathkeys, while leaving the remaining pathkeys
> > unchanged in order. Please see attached.

> BTW have you ever thought about one more, cost-based reordering strategy?
> For now, we can reorder GROUP-BY and distinct clauses according to two
> orderings: 1) ORDER-BY order and 2) order derived from the underlying
> query tree.
> In thread [1], I try to add one more strategy that minimises the number
> of comparison operator calls. It seems that it would work the same way
> with the DISTINCT statement. Do you think it make sense in general and
> can be a possible direction of improvement for the current patch?

I haven’t had a chance to follow that thread. From a quick look at
that patch, it seems to improve the general costing logic for sorting.
If that’s the case, I think it would be beneficial in the areas where
we use cost_sort(), including in this patch.

> > I'm not sure about merging these two 'reordering' GUCs into one.
> > While they may look similar, they apply to very different scenarios.
> > However, I'm open to other suggestions.
> Sure, they enable different optimisations. But, they enable highly
> specialised optimisations. Having two GUCs looks too expensive.
> Moreover, this stuff is cost-based and should work automatically. So, I
> treat these GUCs as mostly debugging or last-chance stuff used to
> disable features during severe slowdowns or bugs. It might make sense to
> group them into a single 'Clause Reordering' parameter.

I don't want to argue too much about this, but the two types of
optimizations are quite different in terms of implementation details.
For example, with DISTINCT ON, one key requirement is ensuring that
the resulting pathkey list matches the original distinctClause
pathkeys, and this logic doesn’t exist in the GROUP BY reordering
code. If these GUCs are mainly for debugging, I think it's better to
keep them separate so that we can debug each optimization individually.

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-11-14 01:11:41 Re: [PATCH] Refactor SLRU to always use long file names
Previous Message Peter Geoghegan 2024-11-14 00:55:52 Re: Fix for pageinspect bug in PG 17