Quick Links

Re: Consider the number of columns in the sort cost model

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	Kirill Reshke <reshkekirill(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Consider the number of columns in the sort cost model
Date:	2024-10-15 06:20:49
Message-ID:	ae713843-0bb7-4726-82c9-0adae203357e@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 10/15/24 12:15, David Rowley wrote:
> As for your patch, I'm suspicious that the general idea you're
> proposing is an actual improvement.
I didn't intend to treat it as an 'improvement' but as an intermediate
patch. The main purpose here is to debate the way & introduce
considering of number of columns. Conservative development approach has
been preferred before.
>
> it seems you're just charging cpu_operator_cost * <number of columns
> to sort by>. It seems like it won't be very hard to fool that into
> doing the wrong thing when the first column to sort by is distinct or
> almost distinct. There's going to be far fewer or no tiebreaker
> comparisons for that case.
As I've written above, it is for starters. It allow to analyse how the
balance between different types of orderings can be changed in extreme
cases. I can join this patch with the following, implementing
differentiation by distincts, but the effect on regression tests will be
smoothed out.
The primary idea is 1) to elaborate GROUP-BY optimisation and 2) give
the optimiser a tool to choose a more optimal sort, comparing
MergeAppend/Sort/IncrementalSort costs.
The whole idea is implemented in the branch [1] and described in the
post [2]. Of course, we differentiate sortings by distinct of the first
column (only one trustworthy statistic). It is not so difficult (but I
doubt the real necessity) to use extended statistics and reflect in the
cost values of different combinations of columns.

>
> As mentioned by Kirill, I also don't understand the cost_sort
> signature change. Why would you do that over just doing
> list_length(pathkeys) within cost_sort()? Your throwing away a
> parameter that might be the most useful one of the bunch for allowing
> better sort cost estimates.
Ok, may be it is too much for an intermediate patch. We can change the
interface later, if necessary.
> Perhaps that could be done within the EquivalenceClass.
Thanks for the idea!

[1] https://github.com/postgrespro/postgres/tree/sort-columnsnum
[2] https://danolivo.substack.com/p/elaboration-of-the-postgresql-sort

--
regards, Andrei Lepikhov

In response to

Re: Consider the number of columns in the sort cost model at 2024-10-15 05:15:13 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2024-10-15 06:45:22	Re: replace strtok()
Previous Message	Peter Eisentraut	2024-10-15 06:12:25	simplify regular expression locale global variables