From: | "Dann Corbit" <DCorbit(at)connx(dot)com> |
---|---|
To: | "Dann Corbit" <DCorbit(at)connx(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, "Jerry Sievers" <jerry(at)jerrysievers(dot)com> |
Subject: | Re: qsort, once again |
Date: | 2006-03-17 01:12:35 |
Message-ID: | D425483C2C5C9F49B5B7A41F8944154757D68B@postal.corporate.connx.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> owner(at)postgresql(dot)org] On Behalf Of Dann Corbit
> Sent: Thursday, March 16, 2006 4:42 PM
> To: Tom Lane
> Cc: Jonah H. Harris; pgsql-hackers(at)postgresql(dot)org; Jerry Sievers
> Subject: Re: [HACKERS] qsort, once again
>
> > So my feeling is we should just remove the swap_cnt code and return
to
> > the original B&M algorithm. Being much faster than expected for
> > presorted input doesn't justify being far slower than expected for
> > other inputs, IMHO. In the context of Postgres I doubt that
perfectly
> > sorted input shows up very often anyway.
> >
> > Comments?
>
> Checking for presorted input is O(n).
> If the input is random, an average of 3 elements will be tested.
> So adding an in-order check of the data should not be too expensive.
>
> I would benchmark several approaches and see which one is best when
used
> in-place.
Even if "hunks" of the input are sorted, the test is a very good idea.
Recall that we are sorting recursively and so we divide the data into
chunks.
Consider an example...
Quicksort of a field that contains Sex as 'M' for male, 'F' for female,
or NULL for unknown.
The median selection is going to pick one of 'M', 'F', or NULL.
After pass 1 of qsort we will have two partitions. One partition will
have all of one type and the other partition will have the other two
types.
An in-order check will tell us that the monotone partition is sorted and
we are done with it.
Imagine also a table that was clustered but for which we have not
updated statistics. Perhaps it is 98% sorted. Checking for order in
our partitions is probably a good idea.
I think you could also get a good optimization if you are checking for
partitions and find a big section of the partition is not ordered (even
though the whole thing is not). If you could perk the ordered size up
the tree, you could just add another partition to the merge list and
sort the unordered part.
In "C Unleashed" I call this idea partition discovery mergesort.
From | Date | Subject | |
---|---|---|---|
Next Message | William ZHANG | 2006-03-17 01:33:03 | Re: Bug report form: locale/encoding |
Previous Message | Dann Corbit | 2006-03-17 00:42:20 | Re: qsort, once again |