Re: Consider the number of columns in the sort cost model

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Kirill Reshke <reshkekirill(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Consider the number of columns in the sort cost model
Date: 2024-10-15 06:20:49
Message-ID: ae713843-0bb7-4726-82c9-0adae203357e@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/15/24 12:15, David Rowley wrote:
> As for your patch, I'm suspicious that the general idea you're
> proposing is an actual improvement.
I didn't intend to treat it as an 'improvement' but as an intermediate
patch. The main purpose here is to debate the way & introduce
considering of number of columns. Conservative development approach has
been preferred before.
>
> it seems you're just charging cpu_operator_cost * <number of columns
> to sort by>. It seems like it won't be very hard to fool that into
> doing the wrong thing when the first column to sort by is distinct or
> almost distinct. There's going to be far fewer or no tiebreaker
> comparisons for that case.
As I've written above, it is for starters. It allow to analyse how the
balance between different types of orderings can be changed in extreme
cases. I can join this patch with the following, implementing
differentiation by distincts, but the effect on regression tests will be
smoothed out.
The primary idea is 1) to elaborate GROUP-BY optimisation and 2) give
the optimiser a tool to choose a more optimal sort, comparing
MergeAppend/Sort/IncrementalSort costs.
The whole idea is implemented in the branch [1] and described in the
post [2]. Of course, we differentiate sortings by distinct of the first
column (only one trustworthy statistic). It is not so difficult (but I
doubt the real necessity) to use extended statistics and reflect in the
cost values of different combinations of columns.

>
> As mentioned by Kirill, I also don't understand the cost_sort
> signature change. Why would you do that over just doing
> list_length(pathkeys) within cost_sort()? Your throwing away a
> parameter that might be the most useful one of the bunch for allowing
> better sort cost estimates.
Ok, may be it is too much for an intermediate patch. We can change the
interface later, if necessary.
> Perhaps that could be done within the EquivalenceClass.
Thanks for the idea!

[1] https://github.com/postgrespro/postgres/tree/sort-columnsnum
[2] https://danolivo.substack.com/p/elaboration-of-the-postgresql-sort

--
regards, Andrei Lepikhov

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-10-15 06:45:22 Re: replace strtok()
Previous Message Peter Eisentraut 2024-10-15 06:12:25 simplify regular expression locale global variables