Quick Links

Re: Optimize ORDER BY ... LIMIT

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Optimize ORDER BY ... LIMIT
Date:	2006-09-15 19:35:18
Message-ID:	87ac50q5xl.fsf@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:

> On Fri, Sep 15, 2006 at 05:30:27PM +0100, Gregory Stark wrote:
>> Also, because heap sort is slower than qsort (on average anyways) it makes
>> sense to not bother with the heap until the number of tuples grows well beyond
>> the limit or until it would otherwise spill to disk.
>
> The thought that comes to mind is to leave the sorting as is, but
> change the code that writes to the tapes to stop writing once it hits
> the limit. So each tape will never have more than N tuples, where N is
> the limit. This would be fairly unobtrusive because none of the other
> code actually needs to care.

I'm sorry, I forgot to mention that part of my analysis. Once you reach disk
any chance of optimising the limit case is pretty much out the window. You
could trim some tuples from each tape but unless you're sorting truly
stupendous amounts of data I doubt it would really help much.

I think it only makes sense to look at the in-memory case. Instead of qsorting
thousands of records or, worse, spilling millions of records to disk and doing
an external sort only to use only the top 10 and throw the rest away, we throw
tuples away before they accumulate in memory in the first place.

>> Alternatively we could have Limit(Sort()), Unique(Sort()), and
>> Limit(Unique(Sort())) be handled by new types of Sort nodes entirely and not
>> introduce the Limit and Unique nodes at all.
>
> I don't think it's easy to merge Unique and Sort, mainly because the
> fields you're doing the Unique on are probably not the fields you're
> sorting on, so you're probably not saving much.

On the contrary I think the vast majority of the time you have a Unique(Sort)
it will be the same key because it will be caused by a SELECT DISTINCT. Now
that I actually test it I see there are more nodes that could do implement
this (namely GroupAgg) so I'm thinking more and more about having an abstract
way to pass information down through the nodes rather than handle just
Limit/Sort specifically.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Re: Optimize ORDER BY ... LIMIT at 2006-09-15 19:21:51 from Martijn van Oosterhout

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2006-09-15 19:37:07	Release notes
Previous Message	Bruce Momjian	2006-09-15 19:30:37	Re: Release notes