Re: Consider the number of columns in the sort cost model

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Kirill Reshke <reshkekirill(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Consider the number of columns in the sort cost model
Date: 2024-10-23 01:39:38
Message-ID: 94433cc8-61c7-4fd4-9e73-722be016fcd8@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15/10/2024 12:15, David Rowley wrote:
> On Tue, 15 Oct 2024 at 17:48, Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> I think maybe what is worth working on is seeing if you can better
> estimate the number of tiebreak comparisons by checking the number of
> distinct values for each sort key. That might require what we
> discussed on another thread about having estimate_num_groups() better
> use the n_distinct estimate from the EquivalenceMember with the least
> distinct values. It'll be another question if all that can be done
> and this all still perform well enough for cost_sort(). You may have
> to write it first before we can figure that out. It may be possible
> to buy back some of the overheads with some additional caching...
> Perhaps that could be done within the EquivalenceClass.
It seems that your idea with an equivalence class works.
In the patchset attached, I show all the ideas (I initially wanted to
discuss it step-by-step).
In the attached, the first patch is about choosing adequate expression
from the list of EC members.
The second one is the current patch, which considers the number of sort
columns.
Third patch - elaboration of the second patch. Use only the trustful
statistics here - the number of ndistinct values on the first sorted column.
And the last patch is a demo of how I'm going to use the previous three
patches and add one more strategy to improve the order of columns in the
GROUP-BY clause.

--
regards, Andrei Lepikhov

Attachment Content-Type Size
0001-Stabilise-incremental-sort-cost-calculation.patch text/plain 9.5 KB
0002-Consider-the-number-of-columns-in-the-cost-sort-mode.patch text/plain 13.8 KB
0003-Consider-ndistinct-on-the-first-column-in-cost_sort.patch text/plain 7.7 KB
0004-Add-GROUP-BY-strategy-put-the-most-distinct-column-a.patch text/plain 7.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2024-10-23 02:47:20 Re: Inconsistent use of relpages = -1
Previous Message David G. Johnston 2024-10-23 00:50:56 Re: EXPLAIN IndexOnlyScan shows disabled when enable_indexonlyscan=on