From: | "Dann Corbit" <DCorbit(at)connx(dot)com> |
---|---|
To: | "Brian Hurt" <bhurt(at)janestcapital(dot)com>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Sorting Improvements for 8.4 |
Date: | 2007-12-20 20:35:02 |
Message-ID: | D425483C2C5C9F49B5B7A41F8944154701000B4F@postal.corporate.connx.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> owner(at)postgresql(dot)org] On Behalf Of Brian Hurt
> Sent: Thursday, December 20, 2007 6:42 AM
> To: pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Sorting Improvements for 8.4
>
> While we're blue skying things, I've had an idea for a sorting
algorithm
> kicking around for a couple of years that might be interesting. It's
a
> variation on heapsort to make it significantly more block-friendly. I
> have no idea if the idea would work, or how well it'd work, but it
might
> be worthwhile kicking around.
>
> Now, the core idea of heapsort is that the array is put into heap
order-
> basically, that a[i] >= a[2i+1] and a[i] >= a[2i+2] (doing the 0-based
> array version here). The problem is that, assuming that the length of
a
> is larger than memory, then a[2i+1] is likely going to be on a
different
> page or block than a[i]. That means every time you have to bubble
down
> a new element, you end up reading O(log N) blocks- this is *per
element*.
>
> The variation is to instead work with blocks, so you have a block of
> entries b[i], and you change the definition of heap order, so that
> min(b[i]) >= max(b[2i+1]) and min(b[i]) >= max(b[2i+2]). Also, during
> bubble down, you need to be carefull to only change the minimum value
of
> one of the two child blocks b[2i+1] and b[2i+2]. Other than that, the
> algorithm works as normal. The advantage of doing it this way is that
> while each bubble down still takes O(log N) blocks being touched, you
> get a entire block worth of results for your effort. Make your blocks
> large enough (say, 1/4 the size of workmem) and you greatly reduce N,
> the number of blocks you have to deal with, and get much better I/O
> (when you're reading, you're reading megabytes at a shot).
>
> Now, there are boatloads of complexities I'm glossing over here. This
> is more of a sketch of the idea. But it's something to consider.
It's an interesting idea to work with a "heap of heaps" where you try to
keep each heap page-sized. It reminds me of the B+ tree, where you
collect a whole bunch of nodes into a single page.
I don't know if you have examined weak-heaps, but there are some
interesting results for weak-heap approaches. As you know, heapsort
variants do not degenerate to O(N^2).
On this link:
http://www.jea.acm.org/2002/EdelkampHeapsort/
I highly recommend all the goodies he has embedded (papers, source,
etc.)
From | Date | Subject | |
---|---|---|---|
Next Message | Ron Mayer | 2007-12-20 22:17:51 | Re: Sorting Improvements for 8.4 |
Previous Message | Magnus Hagander | 2007-12-20 20:29:34 | Re: pgwin32_open returning EINVAL |