From: | Peter Geoghegan <peter(at)2ndquadrant(dot)com> |
---|---|
To: | Greg Stark <stark(at)mit(dot)edu> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Memory usage during sorting |
Date: | 2012-04-14 13:34:56 |
Message-ID: | CAEYLb_Uhwz0BRQR42GJvmcUtSxDW1WYOzD5fOzE-3b1jPrCGpQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 14 April 2012 13:32, Greg Stark <stark(at)mit(dot)edu> wrote:
> On Fri, Apr 13, 2012 at 7:01 PM, Peter Geoghegan <peter(at)2ndquadrant(dot)com> wrote:
>> Well, timsort is specifically designed to take advantage of pre-sorted
>> data. It does appear to have a lot of traction, as wikipedia points
>> out:
>
> I hadn't heard of it. But reading up on it it does seem like a good
> fit for us. It trades some additional storage -- an array of pointers
> into the sort array where in our case the pointers would be much
> smaller than a whole SortTuple -- for fewer comparisons -- which in
> our case are often much slower than a simple integer comparison.
I wouldn't go so far as to suggest getting rid of quicksort, of
course. Quicksort is generally faster than other average case O(n log
n) algorithms in practice, for various reasons, principal among them
that it takes advantage of memory locality so well. I don't think that
it's a coincidence that timsort is popular in Python and Java land.
Even the Python C implementation has to sort integers through all that
PyObject reference indirection, I think. I'd now speculate that an
appropriate use of this algorithm might be to simply use it with types
that don't have a SortSupport function, that are largely passed by
reference, and have expensive comparisons. FWIW, I started playing
with adding timsort to Postgres last night:
https://github.com/Peter2ndQuadrant/postgres/tree/timsort
--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2012-04-14 13:54:39 | Re: Patch: add timing of buffer I/O requests |
Previous Message | Magnus Hagander | 2012-04-14 13:05:50 | Re: column name of pg_stat_replication.backend_start |