From: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PATCH] Incremental sort |
Date: | 2017-09-13 23:48:42 |
Message-ID: | CAPpHfdtvx1oVJHCUe1x-R=t55Uz4uBi9i55BcoTJ7zZN_7Gxfw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, May 8, 2017 at 6:51 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, May 5, 2017 at 11:13 AM, Alexander Korotkov
> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> > Incremental sort is faster in vast majority of cases. It appears to be
> > slower only when whose dataset is one sort group. In this case
> incremental
> > sort is useless, and it should be considered as misuse of incremental
> sort.
> > Slowdown is related to the fact that we anyway have to do extra
> comparisons,
> > unless we somehow push our comparison result into qsort itself and save
> some
> > cpu cycles (but that would be unreasonable break of encapsulation).
> Thus,
> > in such cases regression seems to be inevitable anyway. I think we could
> > evade this regression during query planning. If we see that there would
> be
> > only few groups, we should choose plain sort instead of incremental sort.
>
> I'm sorry that I don't have time to review this in detail right now,
> but it sounds like you are doing good work to file down cases where
> this might cause regressions, which is great.
Thank you for paying attention to this patch!
> Regarding the point in
> the paragraph above, I'd say that it's OK for the planner to be
> responsible for picking between Sort and Incremental Sort in some way.
> It is, after all, the planner's job to decide between different
> strategies for executing the same query and, of course, sometimes it
> will be wrong, but that's OK as long as it's not wrong too often (or
> by too much, hopefully).
Right, I agree.
> It may be a little difficult to get this
> right, though, because I'm not sure that the information you need
> actually exists (or is reliable). For example, consider the case
> where we need to sort 100m rows and there are 2 groups. If 1 group
> contains 1 row and the other group contains all of the rest, there is
> really no point in an incremental sort. On the other hand, if each
> group contains 50m rows and we can get the data presorted by the
> grouping column, there might be a lot of point to an incremental sort,
> because two 50m-row sorts might be a lot cheaper than one 100m sort.
>
More generally, it's quite easy to imagine situations where the
> individual groups can be quicksorted but sorting all of the rows
> requires I/O, even when the number of groups isn't that big. On the
> other hand, the real sweet spot for this is probably the case where
> the number of groups is very large, with many single-row groups or
> many groups with just a few rows each, so if we can at least get this
> to work in those cases that may be good enough. On the third hand,
> when costing aggregation, I think we often underestimate the number of
> groups and there might well be similar problems here.
I agree with that. I need to test this patch more carefully in the case
when groups have different sizes. It's likely I need to add yet another
parameter to my testing script: groups count skew.
Patch rebased to current master is attached. I'm going to improve my
testing script and post new results.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment | Content-Type | Size |
---|---|---|
incremental-sort-8.patch | application/octet-stream | 119.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2017-09-13 23:57:32 | Re: Parallel Hash take II |
Previous Message | Tom Lane | 2017-09-13 23:48:29 | Re: psql: new help related to variables are not too readable |