From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Claudio Freire <klaussfreire(at)gmail(dot)com> |
Cc: | Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Randomisation for ensuring nlogn complexity in quicksort |
Date: | 2013-07-01 20:12:16 |
Message-ID: | CA+Tgmoa+4jZf28DQvzqVMWNA5+xeWgcqMcw+5LcDkXwdANWJ=g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jul 1, 2013 at 3:54 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> On Mon, Jul 1, 2013 at 4:32 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> This shouldn't be too complex, and should give us a fixed nlogn complexity even for wild data sets, without affecting existing normal data sets that are present in every day transactions. I even believe that those data sets will also benefit from the above optimisation.
>>
>> The only method of selecting a pivot for quicksort that obtain O(n lg
>> n) run time with 100% certainty is have a magical oracle inside the
>> computer that tells you in fixed time and with perfect accuracy which
>> pivot you should select.
>
> Doesn't a linear median algorithm (say median of medians) get you O(n lg n)?
>
> Granted, with a huge constant (I think 4)... but it should still be O(n lg n).
No. Thinking about this a little more, I believe the way it works
out is that any algorithm for picking the median that guarantees that
a certain *percentage* of the tuples will be in the smaller partition
will have O(n lg n) complexity, but any algorithm that only guarantees
that a fixed *number* of tuples in the smaller partition is still
quadratic in complexity. In the case of a median algorithm, you're
only guaranteed to have 2 elements in the smaller partition, which is
a constant. If you take a median of medians, you're guaranteed to
have 8 elements in the smaller partition, which is bigger, but still a
constant.
The reason why this doesn't matter much in practice is because the
data distribution that causes quadratic behavior for median-of-medians
is not one which is likely to occur in the real world and will
probably only come up if chosen by an adversary who is attempting to
make your life miserable.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2013-07-01 20:20:18 | Re: "pg_ctl promote" exit status |
Previous Message | Joshua D. Drake | 2013-07-01 20:10:23 | Re: Documentation/help for materialized and recursive views |