From: | Andrei Lepikhov <lepihov(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Kirill Reshke <reshkekirill(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Consider the number of columns in the sort cost model |
Date: | 2024-10-15 06:20:49 |
Message-ID: | ae713843-0bb7-4726-82c9-0adae203357e@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 10/15/24 12:15, David Rowley wrote:
> As for your patch, I'm suspicious that the general idea you're
> proposing is an actual improvement.
I didn't intend to treat it as an 'improvement' but as an intermediate
patch. The main purpose here is to debate the way & introduce
considering of number of columns. Conservative development approach has
been preferred before.
>
> it seems you're just charging cpu_operator_cost * <number of columns
> to sort by>. It seems like it won't be very hard to fool that into
> doing the wrong thing when the first column to sort by is distinct or
> almost distinct. There's going to be far fewer or no tiebreaker
> comparisons for that case.
As I've written above, it is for starters. It allow to analyse how the
balance between different types of orderings can be changed in extreme
cases. I can join this patch with the following, implementing
differentiation by distincts, but the effect on regression tests will be
smoothed out.
The primary idea is 1) to elaborate GROUP-BY optimisation and 2) give
the optimiser a tool to choose a more optimal sort, comparing
MergeAppend/Sort/IncrementalSort costs.
The whole idea is implemented in the branch [1] and described in the
post [2]. Of course, we differentiate sortings by distinct of the first
column (only one trustworthy statistic). It is not so difficult (but I
doubt the real necessity) to use extended statistics and reflect in the
cost values of different combinations of columns.
>
> As mentioned by Kirill, I also don't understand the cost_sort
> signature change. Why would you do that over just doing
> list_length(pathkeys) within cost_sort()? Your throwing away a
> parameter that might be the most useful one of the bunch for allowing
> better sort cost estimates.
Ok, may be it is too much for an intermediate patch. We can change the
interface later, if necessary.
> Perhaps that could be done within the EquivalenceClass.
Thanks for the idea!
[1] https://github.com/postgrespro/postgres/tree/sort-columnsnum
[2] https://danolivo.substack.com/p/elaboration-of-the-postgresql-sort
--
regards, Andrei Lepikhov
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2024-10-15 06:45:22 | Re: replace strtok() |
Previous Message | Peter Eisentraut | 2024-10-15 06:12:25 | simplify regular expression locale global variables |