From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <peter(at)2ndquadrant(dot)com> |
Cc: | PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Greg S <stark(at)mit(dot)edu> |
Subject: | Re: Timsort performance, quicksort (was: Re: Memory usage during sorting) |
Date: | 2012-04-24 21:01:29 |
Message-ID: | CA+Tgmoa8_u=HQjpWvfrBdqDRCidyZsF05RC24ogtvPxWHuhCWQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Apr 24, 2012 at 12:51 PM, Peter Geoghegan <peter(at)2ndquadrant(dot)com> wrote:
> On 24 April 2012 16:17, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> If they are in sorted order with an empty string
>> appended onto the end, it takes about 25 seconds.
>
> That's exactly what I'd have expected, but was surprised to have not
> found with my own test. Perhaps it was same kind of fluke (i.e. a
> re-creatable one - I'm quite confident that my benchmark was not
> methodologically flawed, at least in execution).
Oh, I read your results as showing something quite similar.
>> Based on that, I'm inclined to propose rejiggering things so that the
>> presorted-input check runs only at the top level, and not during any
>> recursive steps. The theory is that it won't cost much because if the
>> data is randomly ordered we won't do many comparisons before falling
>> out, and that seems to be true. But the only point of doing it in the
>> recursive steps is to win when the data is partially ordered, and we
>> actually seem to be losing big-time in that case, perhaps because when
>> the data is partially ordered, the presort check will frequently to
>> run through a significant part of the array - wasting cycles - but
>> fall out before it actually gets to the end.
>
> That makes sense to me, but obviously more data is needed here.
What more data do you think is needed? I've been suspicious of that
code since the first time I looked at it, and I'm now fairly well
convinced that it's full of suckitude. Honestly, I'm not sure I could
manage to contrive a case where that code wins if I set out to do so.
>> Of course, even if we did that, it might not be as good as your
>> timsort numbers, but that doesn't mean we shouldn't do it...
>
> The frustrating thing about my timsort numbers, as I mentioned in
> reply to Dimitri (He modified the subject a bit, so that might appear
> to be a different thread to you), is that they appear to be almost
> consistently winning when you consider the number of comparisons, but
> in fact lose when you measure the duration of execution or TPS or
> whatever. I expected a certain amount of this, but not enough to
> entirely derail the case for replacing quicksort with timsort when
> sorting a single key of text, which is obviously the really compelling
> case for optimisation here. This situation is only going to be made
> "worse" by the work you've done on SortSupport for text, ...
That is quite baffling. Have you profiled it at all?
> ...which,
> incidentally, I agree is worthwhile.
Cool, thanks.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2012-04-24 23:16:01 | Re: Timsort performance, quicksort (was: Re: Memory usage during sorting) |
Previous Message | Robert Haas | 2012-04-24 20:55:18 | Re: Timsort performance, quicksort (was: Re: Memory usage during sorting) |