Re: [PATCH] Teach planner to further optimize sort in distinct

From: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Teach planner to further optimize sort in distinct
Date: 2023-01-19 19:26:13
Message-ID: ca9cb365-0d6c-ae89-1e7b-76880c46a124@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On 19/01/23 18:49, David Rowley wrote:

> I think you should write a function like:

> bool pathkeys_count_contained_in_unordered(List *keys1, List *keys2,
> List **reorderedkeys, int *n_common)

> which works very similarly to pathkeys> _count_contained_in, but
> populates *reorderedkeys so it contains all of the keys in keys1, but
> put the matching ones in the same order as they are in keys2. If you
> find a keys2 that does not exist in keys1 then just add the additional
> unmatched keys1 keys to *reorderedkeys. Set *n_common to the number
> of common keys excluding any that come after a key2 key that does not
> exist as a key1 key.

> You can just switch to using that function in
> create_final_distinct_paths(). You'll need to consider if the query is
> a DISTINCT ON query and not try the unordered version of the function
> in that case.

Tried this, it worked as expected. Tests are green as well.

> I also just noticed that in build_index_paths() we'll leave the index
> path's pathkeys empty if we deem the pathkeys as useless. I'm not
> sure what the repercussions of setting those to the return value of
> build_index_pathkeys() if useful_pathkeys is otherwise empty.

This is very rigid indeed.

> It's possible that truncate_useless_pathkeys() needs to be modified to
> check if the pathkeys might be useful for DISTINCT

We have pathkeys_useful_for_merging and pathkeys_useful_for_ordering.

Can we not have pathkeys_useful_for_distinct?

Also, pathkeys_useful_for_ordering calls pathkeys_count_contained_in.

We can add code path on similar lines.

> When I
> thought about it I assumed that we always set IndexPath's pathkeys to
> whatever (if any) sort order that the index provides.

Can we not added original path keys in IndexPath? It could be useful

at other places as well. Atleast, I can see it useful in sorting cases.

> makes me wonder if this entire patch is a good idea.

We are still getting some benefit even without index paths for now.

I have attached patch with pathkeys_count_contained_in_unordered

and corresponding changes in create_final_distinct_paths for reference.

Thanks,

Ankit

Attachment Content-Type Size
better-distinct-v2.patch text/x-patch 3.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2023-01-19 19:31:25 Re: Extracting cross-version-upgrade knowledge from buildfarm client
Previous Message Robert Haas 2023-01-19 19:17:35 Re: almost-super-user problems that we haven't fixed yet