Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: James Coleman <jtc331(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2019-06-24 17:00:50
Message-ID: CAAaqYe9N=BZDJ2U9FsRX1-foJu-a8u1t1KGYtvf=zJdUAO9JdA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 24, 2019 at 12:56 PM Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> On Mon, 24 Jun 2019 at 16:10, James Coleman <jtc331(at)gmail(dot)com> wrote:
>>
>> On Thu, Jun 13, 2019 at 11:38:12PM -0400, James Coleman wrote:
>> >I think the first thing to do is get some concrete numbers on performance if we:
>> >
>> >1. Only sort one group at a time.
>> >2. Update the costing to prefer traditional sort unless we have very
>> >high confidence we'll win with incremental sort.
>> >
>> >It'd be nice not to have to add additional complexity if at all possible.
>>
>> I've been focusing my efforts so far on seeing how much we can
>> eliminate performance penalties (relative to traditional sort). It
>> seems that if we can improve things enough there that we'd limit the
>> amount of adjustment needed to costing -- we'd still need to consider
>> cases where the lower startup cost results in picking significantly
>> different plans in a broad sense (presumably due to lower startup cost
>> and the ability to short circuit on a limit). But I'm hopeful then we
>> might be able to avoid having to consult MCV lists (and we wouldn't
>> have that available in all cases anyway)
>>
>> As I see it the two most significant concerning cases right now are:
>> 1. Very large batches (in particular where the batch is effectively
>> all of the matching rows such that we're really just doing a standard
>> sort).
>> 2. Many very small batches.
>
>
> What is the specific use case for this? This sounds quite general case.

They are both general cases in some sense, but the concerns lie mostly
with what happens when they're unexpectedly encountered. For example,
if the expected row count or group size is off by a good bit and we
effectively have to perform a sort of all (or most) possible rows.

If we can get the performance to a point where that misestimated row
count or group size doesn't much matter, then ISTM including the patch
becomes a much more obvious total win.

> Do we know something about the nearly-sorted rows that could help us? Or could we introduce some information elsewhere that would help with the sort?
>
> Could we for-example, pre-sort the rows block by block, or filter out the rows that are clearly out of order, so we can re-merge them later?

I'm not sure what you mean by "block by block"?

James Coleman

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-06-24 17:22:14 Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions
Previous Message Simon Riggs 2019-06-24 16:56:04 Re: [PATCH] Incremental sort (was: PoC: Partial sort)