From: | tmp <skrald(at)amossen(dot)dk> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Optimizing DISTINCT with LIMIT |
Date: | 2008-12-04 13:32:51 |
Message-ID: | gh8m5v$7oj$1@news.hub.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
As far as I have understood the following query
SELECT DISTINCT foo
FROM bar
LIMIT baz
is done by first sorting the input and then traversing the sorted data,
ensuring uniqueness of output and stopping when the LIMIT threshold is
reached. Furthermore, a part of the sort procedure is to traverse input
at least one time.
Now, if the input is large but the LIMIT threshold is small, this
sorting step may increase the query time unnecessarily so here is a
suggestion for optimization:
If the input is "sufficiently" large and the LIMIT threshold
"sufficiently" small, maintain the DISTINCT output by hashning while
traversing the input and stop when the LIMIT threshold is reached. No
sorting required and *at* *most* one read of input.
Use case: Websites that needs to present small samples of huge queries fast.
From | Date | Subject | |
---|---|---|---|
Next Message | Gregory Stark | 2008-12-04 13:48:05 | Re: In-place upgrade: catalog side |
Previous Message | Gregory Stark | 2008-12-04 13:21:37 | Assertion failure in new outer/semi/anti join code |