Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2020-03-28 02:58:30
Message-ID: 20200328025830.6v6ogkseohakc32q@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 27, 2020 at 09:36:55PM -0400, James Coleman wrote:
>On Fri, Mar 27, 2020 at 9:19 PM Tomas Vondra
><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>> On Fri, Mar 27, 2020 at 12:51:34PM -0400, James Coleman wrote:
>> >In a previous email I'd summarized remaining TODOs I'd found. Here's
>> >an updated listed with several resolved.
>> >
>> >Resolved:
>> >
>> >2. Not marked in the patch, but in nodeIncrementalSort.c
>> >ExecIncrementalSort() I wonder if perhaps we should move the algorithm
>> >discussion comments up to the file header comment. On the other hand,
>> >I suppose it could be valuable to leave the the file header comment
>> >more high level about the mathematical properties of incremental sort
>> >rather than discussing the details of the hybrid mode.
>> >
>> >I've decided to do this, and the attached patch series includes the change.
>> >
>>
>> It's a bit tough to find the right balance what to put into the header
>> comment and what should go to function comments, but this seems mostly
>> reasonable. I wouldn't use the double-tab indentation and the copyright
>> notices should stay at the top.
>
>Fixed. I also re-ran pg_indent on the the nodeIncrementalSort.c file.
>
>> >3. nodeIncrementalSort.c ExecIncrementalSort() in the main for loop:
>> > * TODO: do we need to check for interrupts inside these loops or
>> > * will the outer node handle that?
>> >
>> >It seems like what we have is sufficient, given that the nodes (and
>> >sort) we rely on have their own calls. The one place where someone
>> >might make an argument otherwise would be in the mode transition
>> >function where we copy tuples from the full sort state to the
>> >presorted sort state. If this is a problem, let me know, and I'll
>> >change it, but I'm proceeding under the assumption for now that it's
>> >not.
>> >
>>
>> I think what we have now is sufficient.
>>
>> >4. nodeIncrementalSort.c ExecReScanIncrementalSort: This whole chunk
>> >is suspect. I've mentioned previously I don't have a great mental
>> >model of how rescan works and its invariants (IIRC someone said it was
>> >about moving around a result set in a cursor). Regardless I'm pretty
>> >sure this code just doesn't work correctly. Additionally the sort_Done
>> >variable is poorly named; it probably would make more sense to call it
>> >something like "scanHasBegun". I'm waiting to change it though until
>> >cleaning up this code more holistically.
>> >
>> >Fixed, as described in previous email.
>> >
>> >6. regress/expected/incremental_sort.out:
>> >-- TODO if an analyze happens here the plans might change; should we
>> >-- solve by inserting extra rows or by adding a GUC that would somehow
>> >-- forcing the time of plan we expect.
>> >
>> >I've decided this doesn't seem to be a real issue, so, comment removed.
>> >
>>
>> OK
>>
>> >7. Not listed as a comment in the patch, but I need to modify the
>> >testing for analyze output to parse out the memory/disk stats to the
>> >tests are stable.
>> >
>> >Included in the attached patch series. I use plpgsql to munge out the
>> >space kB numbers. I also discovered two bugs in the JSON output along
>> >the way and fixed those (memory and disk need to be output separate;
>> >disk was using the wrong "space type" enum). Finally I also use
>> >plpgsql to check a few invariants (for now just that max space is
>> >greater than or equal to the average.
>> >
>>
>> OK
>>
>> >8. optimizer/path/allpaths.c get_useful_pathkeys_for_relation:
>> >* XXX At the moment this can only ever return a list with a single element,
>> >* because it looks at query_pathkeys only. So we might return the pathkeys
>> >* directly, but it seems plausible we'll want to consider other orderings
>> >* in the future.
>> >
>> >I think we just leave this in as a comment.
>> >
>>
>> Fine with me.
>>
>> As a side note here, I'm wondering if this (determining useful pathkeys)
>> can be made a bit smarter by looking both at query_pathkeys and pathkeys
>> useful for merging, similarly to what truncate_useless_pathkeys() does.
>> But that can be seen as an improvement of what we do now.
>
>Unless your comment below about looking at truncate_useless_pathkeys
>is implying you're considering aiming to get this in now, I wonder if
>we should just expand the comment to reference pathkeys useful for
>merging as a possible future extension.
>

Maybe. I've been thinking about how to generate those path keys, but
it's a bit tricky, because pathkeys_useful_for_merging() - the thing
that backs truncate_useless_pathkeys() actually gets pathkeys and merely
verifies if those are useful for merging. But this code needs to do the
opposite - generate those pathkeys.

But let's say we know we have a join on columns (a,b,c). For each of
those PathKey values we know it's useful for merging, but should we
generate pathkeys (a,b,c), (b,c,a), ... or any other permutation? I
suppose we can look at pathkeys for existing paths of the relation to
prune the possibilities a bit, but what then?

BTW I wonder if we actually need the ec_has_volatile check in
get_useful_pathkeys_for_relation. The comment says we can't but is it
really true? pathkeys_useful_for_ordering doesn't do any such checks, so
I'd bet this is merely an unnecessary copy-paste from postgres_fdw.
Opinions?

>> >9. optimizer/path/allpaths.c get_useful_pathkeys_for_relation:
>> >* Considering query_pathkeys is always worth it, because it might let us
>> >* avoid a local sort.
>> >
>> >That originally was a copy from the fdw code, but since the two
>> >functions have diverged (Is that concerning? I could be confusing, but
>> >isn't a compilation problem) I didn't move the function.
>> >
>>
>> I think it's OK the two functions diverged, it's simply because the FDW
>> one needs to check other things too. But I might rework this once I look
>> closer at truncate_useless_pathkeys.
>
>Agreed, for now at least. It's tempting to think they should always be
>shared, but I'm not convinced (without a lot more digging) that this
>represents structural rather than incidental duplication.
>

The more I look at pathkeys_useful_for_ordering() the more I think the
get_useful_pathkeys_for_relation() function should look more like it
than the postgres_fdw one ...

>> >I did notice though that find_em_expr_for_rel() is wholesale copied
>> >(and unchanged) from the fdw code, so I moved it to equivclass.c so
>> >both places can share it.
>> >
>>
>> +1
>>

... which would also get rid of find_em_expr_for_rel().

>> >
>> >Still remaining:
>> >
>> >1. src/backend/optimizer/util/pathnode.c add_partial_path()
>> >* XXX Perhaps we could do this only when incremental sort is enabled,
>> >* and use the simpler version (comparing just total cost) otherwise?
>> >
>> >I don't have a strong opinion here. It doesn't seem like a significant
>> >difference in terms of cost?
>> >
>> >5. planner.c create_ordered_paths:
>> >* XXX This only looks at sort_pathkeys. I wonder if it needs to look at the
>> >* other pathkeys (grouping, ...) like generate_useful_gather_paths.
>> >
>> >10. optimizer/path/allpaths.c generate_useful_gather_paths:
>> >* XXX I wonder if we need to consider adding a projection here, as
>> >* create_ordered_paths does.
>> >
>> >11. In the same function as the above:
>> >* XXX Can't we skip this (maybe only for the cheapest partial path)
>> >* when the path is already sorted? Then it's likely duplicate with
>> >* the path created by generate_gather_paths.
>> >
>> >12. In the same function as the above:
>> >* XXX This is not redundant with the gather merge path created in
>> >* generate_gather_paths, because that merely preserves ordering of
>> >* the cheapest partial path, while here we add an explicit sort to
>> >* get match the useful ordering.
>> >
>> >13. planner.c create_ordered_paths:
>> >* XXX This is probably duplicate with the paths we already generate
>> >* in generate_useful_gather_paths in apply_scanjoin_target_to_paths.
>> >
>> >Tomas, any chance you could take a look at the above XXX/questions? I
>> >believe all of them that remain relate to the planner patches.
>> >
>>
>> Yes, I'll take a look over the weekend.
>
>Awesome, thanks.
>
>I collapsed things down including the changes referenced in this
>email, since they were all comment formatting changes.
>

Thanks.

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2020-03-28 03:17:54 Re: [PATCH] Btree BackwardScan race condition on Standby during VACUUM
Previous Message James Coleman 2020-03-28 02:25:29 Re: Make mesage at end-of-recovery less scary.