Re: Todo: Teach planner to evaluate multiple windows in the optimal order

From: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Date: 2023-01-03 07:09:41
Message-ID: 398f39bf-54c0-e510-5507-6dab1b65d5e7@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 03/01/23 08:21, David Rowley wrote:
> On Mon, 26 Dec 2022 at 02:04, Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com> wrote:
>> Point #1
>>
>> In the above query Oracle 10g performs 2 sorts, DB2 and Sybase perform 3
>> sorts. We also perform 3.
> This shouldn't be too hard to do. See the code in
> select_active_windows(). You'll likely want to pay attention to the
> DISTINCT pathkeys if they exist and just use the ORDER BY pathkeys if
> the query has no DISTINCT clause. DISTINCT is evaluated after Window
> and before ORDER BY.
>
> One idea to implement this would be to adjust the loop in
> select_active_windows() so that we record any WindowClauses which have
> the pathkeys contained in the ORDER BY / DISTINCT pathkeys then record
> those separately and append those onto the end of the actives array
> after the sort.
>
> I do think you'll likely want to put any WindowClauses which have
> pathkeys which are a true subset or true superset of the ORDER BY /
> DISTINCT pathkeys last. If they're a superset then we won't need to
> perform any additional ordering for the DISTINCT / ORDER BY clause.
> If they're a subset then we might be able to perform an Incremental
> Sort, which is likely much cheaper than a full sort. The existing
> code should handle that part. You just need to make
> select_active_windows() more intelligent.
>
> You might also think that we could perform additional optimisations
> and also adjust the ORDER BY clause of a WindowClause if it contains
> the pathkeys of the DISTINCT / ORDER BY clause. For example:
>
> SELECT *,row_number() over (order by a,b) from tab order by a,b,c;
>
> However, if you were to adjust the WindowClauses ORDER BY to become
> a,b,c then you could produce incorrect results for window functions
> that change their result based on peer rows.
>
> Note the difference in results from:
>
> create table ab(a int, b int);
> insert into ab select x,y from generate_series(1,5) x, generate_Series(1,5)y;
>
> select a,b,count(*) over (order by a) from ab order by a,b;
> select a,b,count(*) over (order by a,b) from ab order by a,b;
>
Thanks, let me try this.

>> and Point #2
>>
>> Teach planner to decide which window to evaluate first based on costs.
>> Currently the first window in the query is evaluated first, there may be no
>> index to help sort the first window, but perhaps there are for other windows
>> in the query. This may allow an index scan instead of a seqscan -> sort.
> What Tom wrote about that in the first paragraph of [1] still applies.
> The problem is that if the query contains many joins that to properly
> find the cheapest way of executing the query we'd have to perform the
> join search once for each unique sort order of each WindowClause.
> That's just not practical to do from a performance standpoint. The
> join search can be very expensive. There may be something that could
> be done to better determine the most likely candidate for the first
> WindowClause using some heuristics, but I've no idea what those would
> be. You should look into point #1 first. Point #2 is significantly
> more difficult to solve in a way that would be acceptable to the
> project.
>
Okay, leaving this out for now.

--
Regards,
Ankit Kumar Pandey

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-01-03 07:28:29 Re: typos
Previous Message vignesh C 2023-01-03 07:01:10 Re: Time delayed LR (WAS Re: logical replication restrictions)