Re: sqlsmith crash incremental sort

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: James Coleman <jtc331(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: sqlsmith crash incremental sort
Date: 2020-04-17 08:41:34
Message-ID: CAMbWs4-Nq1rp_v5SonSykWOcVCK5XfUJ5F9oWa+U8X=7ronK1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 17, 2020 at 2:44 AM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> On Thu, Apr 16, 2020 at 12:04:03PM -0400, James Coleman wrote:
> >On Thu, Apr 16, 2020 at 8:22 AM Richard Guo <guofenglinux(at)gmail(dot)com>
> wrote:
> >> On Thu, Apr 16, 2020 at 6:35 PM Richard Guo <guofenglinux(at)gmail(dot)com>
> wrote:
> >>
> >> Attached is what I'm thinking about this optimization. Does it make any
> >> sense?
> >
> >Shouldn't this go one either a new thread or on the thread for the
> >patch Tomas was referencing (by Teodor I believe)?
> >
>
> FWIW the optimization I had in mind is this:
>
> https://commitfest.postgresql.org/21/1651/
>
> I now realize that was about GROUP BY, but it's not all that different
> and the concerns will / should be fairly similar, I think.
>

Thanks for pointing out this thread. Very helpful.

>
> IMO simply tweaking the sort keys to match the upper parts of the plan
> is probably way too simplistic, I'm afraid. For example, if the Unique
> significantly reduces cardinality, then the cost of the additional sort
> is much less important. It may be much better to optimize the "large"
> sort of the whole data set, either by reordering the columns as proposed
> by Teodor in his patch (by number of distinct values and/or cost of the
> comparison function function).
>

Since we don't have Teodor's patch for now, I think it is a clear win if
we can reorder the sort keys in 'Unique/SetOp->Sort->Append' path to
avoid a final Sort/Incremental Sort node, because currently the 'Sort
and Unique/SetOp' above Append simply performs sorting in the order of
columns it sees. I think this is the same logic we do for mergejoin
paths that we try to match the requested query_pathkeys to avoid a
second sort.

>
> Furthermore, this is one of the places that is not using incremental
> sort yet - I can easily imagine doing something like this:
>
>
> Sort
> -> Unique
> -> Incremenal Sort
> -> ...
>
> could be a massive win. So I think we can't just rejigger the sort keys
> abitrarily, we should / need to consider those alternatives.
>

This optimization would only apply to 'Unique/SetOp->Sort->Append' path.
I don't think it will affect our choise of incremental sort in other
cases. For example, with this optimization, we still can choose
incremental sort:

# explain (costs off)
select * from
(select distinct a, c from (select * from foo order by 1,3) as sub) as
sub1
order by 2,1;
QUERY PLAN
-------------------------------------------------------
Sort
Sort Key: sub.c, sub.a
-> Unique
-> Subquery Scan on sub
-> Incremental Sort
Sort Key: foo.a, foo.c
Presorted Key: foo.a
-> Index Scan using foo_a on foo
(8 rows)

>
> >Or are you saying you believe this patch guarantees we never see this
> >problem in incremental sort costing?
> >
>
> Yeah, that's not entirely close to me. But maybe it shows us where we to
> get the unprocessed target list?
>
>
I'm not sure if there are other cases that we would build
targetlit/pathkeys out of 'varno 0' Vars. But for this case here, the
'Unique/SetOp->Sort' above Append would sort the result of Append on all
columns, in the arbitrary order as it sees, (not based on any statistics
as Teodor's patch does), we can always reorder the sort keys trying to
match with result ordering requirements, thus to avoid the final
Sort/Incremental Sort node. So that we can prevent this problem in
incremental sort costing for this case.

Am I missing something? Please correct me.

Thanks
Richard

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeremy Morton 2020-04-17 09:00:10 Re: Support for DATETIMEOFFSET
Previous Message Kyotaro Horiguchi 2020-04-17 08:41:24 Re: Race condition in SyncRepGetSyncStandbysPriority