From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Memory usage during sorting |
Date: | 2012-04-13 14:40:34 |
Message-ID: | CA+Tgmob3k_m=ietqXccJZgoqeZxjj7WJmnV7HawXT0nQc-VKeA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 21, 2012 at 8:57 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Mon, Mar 19, 2012 at 12:23 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> ...
>>
>> One thing that seems inefficient to me about our current algorithm is
>> the use of the run number as a leading column in the sort key.
>> There's no real reason why the tuples destined for the next run need
>> to be maintained in heap order; we could just store them unordered and
>> heapify the whole lot of them when it's time to start the next run.
>> That ought to save comparison cycles for the current run, since the
>> heap will be less deep, and there are other possible savings as well -
>> in particular, an unordered array of tuples can be heapify'd in linear
>> time, whereas our current algorithm is O(n lg n). However, when I
>> actually implemented this, I found that doing it this way was a loser.
>> In fact, excluding the next-run tuples from the heap for the current
>> run was a BIG loser even before you add in the time required to
>> re-heapify. This confused the daylights out of me for a while,
>> because it's hard to understand how insert and siftup can be slower on
>> a small heap than a large one.
>>
>> My working theory is that, even though we must necessarily do more
>> comparisons when manipulating a larger heap, many of those comparisons
>> are resolved by comparing run numbers, and that's a lot cheaper than
>> invoking the real comparator. For example, if we insert into a heap
>> whose rightmost child is in the next run, and the new tuple goes into
>> the current run, and the current size of the heap is such that the
>> initial position of the new element is a descendent of the right node,
>> it will very quickly crash through all of the next-run tuples and only
>> need one REAL comparison - against the root. Either the new element
>> ends up as the root, or it ends up as the direct child of the root;
>> now we remove the root and, perhaps, replace it with its rightmost
>> child, meaning that the next element we read in can do the exact same
>> thing all over again. If we exclude the next-run elements from the
>> heap, then the average depth of the heap when inserting a new element
>> is smaller, but all the elements are in the same-run, and we have to
>> invoke the real comparator every time. In other words, those next-run
>> tuples act as dummies which effectively create a heap of uneven depth,
>> and the average depth encountered when inserting tuples turns out to
>> be less than what we get by pulling out the dummies and making the
>> depth uniform.
>>
>> This makes me kind of nervous, because I don't understand why things
>> should work out so well as all that. If throwing some dummy elements
>> into the heap makes things work better, then maybe we should throw in
>> some more. Or maybe it would work better to take some but not all of
>> them out. There certainly doesn't seem to be any indication in the
>> code that this is an anticipated effect, and it seems an awfully
>> providential accident.
>
> Is there a patch or a git repo available for this change? I'd like to
> see if I can analyze that if I get some spare time.
Sorry for the slow response to this. Patch is attached
(crummy-external-sort.patch).
> Clever. Rather than doing a binary search of the path, it might make
> sense to do a reverse search of it. The tuple is likely to send up
> somewhere at the bottom of the heap, both because that is where most
> of the data is, and because the tuple being reinserted just came from
> the bottom so it is likely to be biased that way.
Patches for these approaches also attached (siftup-using-bsearch.patch
and siftup-reverse-linear.patch). Neither approach seemed terribly
promising in my testing, IIRC.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment | Content-Type | Size |
---|---|---|
crummy-external-sort.patch | application/octet-stream | 4.7 KB |
siftup-reverse-linear.patch | application/octet-stream | 3.1 KB |
siftup-using-bsearch.patch | application/octet-stream | 2.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2012-04-13 14:46:16 | Re: Improving our clauseless-join heuristics |
Previous Message | Tom Lane | 2012-04-13 14:32:25 | Improving our clauseless-join heuristics |