From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Some revises in adding sorting path |
Date: | 2023-02-21 09:02:44 |
Message-ID: | CAMbWs4-X42MCvgZUWjC3VrTXm5eQzbKxQBLapwKgjJehQZ0JYQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Feb 14, 2023 at 10:53 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> I looked at the three patches and have some thoughts:
Thanks for reviewing!
> 0001:
>
> Does the newly added test have to be this complex? I think it might
> be better to just come up with the most simple test you can that uses
> an incremental sort. I really can't think why the test requires a FOR
> UPDATE, to test incremental sort, for example. The danger with making
> a test more complex than it needs to be is that it frequently gets
> broken by unrelated changes. The complexity makes it harder for
> people to understand the test's intentions and that increases the risk
> that the test eventually does not test what it was originally meant to
> test as the underlying code changes and the expected output is
> updated.
That makes sense. I agree that we should always use the minimal query
for test. For this patch, as pointed by Etsuro, it may not be a good
idea for the change. So I'll remove 0001.
> 0002:
>
> I think the following existing comment does not seem to be true any longer:
>
> > However, there's one more
> > * possibility: it may make sense to sort the cheapest partial path
> > * according to the required output order and then use Gather Merge.
>
> You've removed the comment that talks about trying Incremental Sort ->
> Gather Merge paths yet the code is still doing that, the two are just
> more consolidated now, so perhaps you need to come up with a new
> comment to explain what we're trying to achieve.
Yes, you are right. How about the comment below?
- * possibility: it may make sense to sort the cheapest partial path
- * according to the required output order and then use Gather Merge.
+ * possibility: it may make sense to sort the cheapest partial path or
+ * incrementally sort any partial path that is partially sorted according
+ * to the required output order and then use Gather Merge.
Looking at the codes now I have some concern that what we do in
create_ordered_paths for partial paths may have already been done in
generate_useful_gather_paths, especially when query_pathkeys is equal to
sort_pathkeys. Not sure if this is a problem. And the comment there
mentions generate_gather_paths(), but ISTM we should mention what
generate_useful_gather_paths has done.
> 0003:
>
> Looking at gather_grouping_paths(), I see it calls
> generate_useful_gather_paths() which generates a bunch of Gather Merge
> paths after sorting the cheapest path and incrementally sorting any
> partially sorted paths. Then back in gather_grouping_paths(), we go
> and create Gather Merge paths again, but this time according to the
> group_pathkeys instead of the query_pathkeys. I know you're not really
> changing anything here, but as I'm struggling to understand why
> exactly we're creating two sets of Gather Merge paths, it makes it a
> bit scary to consider changing anything in this area. I've not really
> found any comments that can explain to me sufficiently well enough so
> that I understand why this needs to be done. Do you know?
I'm also not sure why gather_grouping_paths creates Gather Merge paths
according to group_pathkeys after what generate_useful_gather_paths has
done. There is comment that mentions this but seems more explanation is
needed.
* generate_useful_gather_paths does most of the work, but we also consider a
* special case: we could try sorting the data by the group_pathkeys and then
* applying Gather Merge.
It seems that if there is available group_pathkeys, we will set
query_pathkeys to group_pathkeys because we want the result sorted for
grouping. In this case gather_grouping_paths may just generate
duplicate Gather Merge paths because generate_useful_gather_paths has
generated Gather Merge paths according to query_pathkeys. I tried to
reduce the code of gather_grouping_paths to just call
generate_useful_gather_paths and found no diffs in regression tests.
Thanks
Richard
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2023-02-21 09:34:11 | Re: Incorrect command tag row count for MERGE with a cross-partition update |
Previous Message | Richard Guo | 2023-02-21 08:55:24 | Re: Some revises in adding sorting path |