Re: Implementing Sorting Refinements

From: <mac_man2005(at)hotmail(dot)it>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Implementing Sorting Refinements
Date: 2008-01-08 00:04:44
Message-ID: BAY132-DS29B177620B90BA0D7A64EE6480@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Well, sorry for hijacking... ummm how did I do that?

Anyway I'll thank you for giving a "sign of life" when I was almost loosing
my hopes to get any kind of answer from "-hackers".

I suppose the lack of answers was due to the way I wrote my mail. At that
moment I supposed that at least someone reminded the 2WRS technique and
possible benefits described into previous posts.
I think I was wrong, so I'll write it once again hoping meanwhile to get
some suggestions on: HOWTO write a mail to which "-hackers" will give an
answer :) hehehehe

Thanks for your attention.
Manolo.

--------------------------------------------------
From: "Decibel!" <decibel(at)decibel(dot)org>
Sent: Tuesday, January 08, 2008 12:34 AM
To: "Manolo _" <mac_man2005(at)hotmail(dot)it>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Implementing Sorting Refinements

> You'll get better response if you don't hijack threads. :) Also, there's
> nothing in here that describes what the benefits of this change are.
>
> On Jan 1, 2008, at 2:09 PM, Manolo _ wrote:
>
>>
>> Hi to all.
>>
>> This mail is aimed at asking some suggestion to face PostgreSQL (PG)
>> development to implement a refinement to PG source code. I'll briefly
>> summarize the idea of the "2-Way Replacement Selection" (2WRS)
>> refinement, just in case. If you already remember what 2WRS is, you can
>> please jump directly to the IMPLEMENTATION ISSUES part of this mail.
>>
>>
>> 2WRS.
>> Refinement of the Replacement Selection (RS) technique currently used by
>> PG in src/backend/utils/sort/tuplesort.c .
>> The 2WRS uses two heaps instead of just one in order to create the
>> current (logical) run. Here there are some fundamental points of the
>> 2WRS technique:
>> - 'qsort' the initial unsorted 'memtuples' array
>> - divide the 'memtuples' array into two parts and each of those will be
>> managed as a heap
>> - one of the heaps will arrange its elements in ascending order, while
>> the other heap in descending order
>> - each heap will spill its current root in its corresponding run (i.e.:
>> we have a run per each of those two heaps), so we are actually creating
>> 2 physical current runs
>> - those 2 current physical runs could "theoretically" be merged into the
>> same logical run, actually we can make 'mergesort' think they do belong
>> to the same physical run. That reduces the number of comparisons
>> 'mergesort' has to do at each merge step (that means less seek delay
>> time on mass storage). We can also think the average length of logical
>> runs produced by 2WRS will probably be greater than the average length
>> produced by RS (again less seek delay time: the longer each run the less
>> number of final runs produced, that means the less number of comparisons
>> at each 'mergesort' step).
>>
>>
>> IMPLEMENTATION ISSUES.
>> Where to place those heaps?
>> 1) I think that both heaps could be arranged on the same 'memtuples'
>> array. That allows easily subsequent resizing those heaps according to
>> their effective use or according to some heuristic, without reallocating
>> memory.
>>
>> How to arrange those heaps?
>> 2a) It's convenient to arrange those heaps "root to root". That is
>> arranging those heaps with their roots toward the center of 'memtuples'
>> (in a way we can say they lay "face to face", or "root to root" as said
>> before) while their leaves lay towards the extreme indexes of the
>> 'memtuples' array (that is the last leaf of one heap will lay at index
>> 0, the last leaf of the other heap laying at index memtupsize-1. This
>> arrangement prevents overlapping elements between those physical runs
>> associated to the same current logical run.
>> PRO: once we qsort memtuples and divide it into 2 parts we already get
>> those two heaps, no need to build them.
>> CONTRA: ???
>>
>> 2b) As in 2a) but arranging heaps "leaf to leaf", that is their roots
>> will lay at the extreme indexes of 'memtuples' while their leaves
>> towards the center of the 'memtuples' array.
>> Or even start building heaps as soon as we get initial elements, instead
>> of qsort the whole 'memtuples' array.
>> Any PRO/CONTRA compared to 2a)???
>>
>> Current run numbers
>> I think I should duplicate the 'int currentRun' variable in the
>> Tuplesortstate struct. I'll replace it with a 'int currentRunUP' and
>> 'int currentRunDOWN' variables in order to distinguish those two
>> physical runs associated to those 2 heaps. In this case I will give a
>> run number (max{currentRunUP,currentRunDOWN} + 1) to elements not
>> belonging to the current logical run. I suppose no need to touch 'long
>> availMem' nor 'long allowedMem' variables nor any others.
>>
>> Heap functions
>> I will duplicate all the heap management functions in order to adapt
>> them to the kind of heap they should be applied to (for example, the
>> tuplesort_heap_siftup function should be replaced with
>> tuplesort_heap_siftupUP and tuplesort_heap_siftupDOWN functions).
>>
>> Merge Plan
>> This technique would use a sort of "merge plan" to instruct mergesort on
>> how to use those physical runs. Actually mergesort should consider at
>> first "odd runs" before "pair runs". That is, for example, mergesort
>> should start merging runs with run number 1,3,5,7,... and when run
>> number X terminates start considering run number X+1. Obviously that
>> doesn't need any merge plan, but I remember someone else (as Simon
>> Riggs) was interested in sorting improvements so it could be a good
>> thing to know if I should consider any conventions or paramethers in
>> order to possibly create that merge plan.
>>
>>
>> DEVELOPMENT CONTEXT
>> I preferred to use the "last stable release" at the moment, that is 8.2.
>>
>>
>> Any comment/suggestion/advice ?
>>
>> Thanks for your attention and for your time.
>> Regards, Manolo.
>> _________________________________________________________________
>> Express yourself instantly with MSN Messenger! Download today it's FREE!
>> http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
>> ---------------------------(end of broadcast)---------------------------
>> TIP 4: Have you searched our list archives?
>>
>> http://archives.postgresql.org
>>
>
> --
> Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
> Give your computer some brain candy! www.distributed.net Team #1828
>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2008-01-08 00:06:27 Re: 8.3.0 release schedule (Was:Re: [BUGS] BUG #3852: Could not create complex aggregate)
Previous Message Decibel! 2008-01-07 23:34:05 Re: Implementing Sorting Refinements