From: | Francois Deliege <fdeliege(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Distinct + Limit |
Date: | 2012-03-27 20:54:36 |
Message-ID: | 16737833.463.1332881676120.JavaMail.geo-discussion-forums@pbcpw7 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi list,
I have the following table with millions of rows:
CREATE TABLE table1
(
col1 text,
col2 text,
col3 text,
col4 text,
col5 text,
col6 text
)
select col1 from table1 group by col1 limit 1;
select distinct on (col1) col1 from table1 limit 1;
select col1 from table1 group by col1 limit 2;
select distinct on (col1) col1 from table1 limit 2;
Performing any of these following queries results in a full sequential scan, followed by a hash aggregate, and then the limit. An optimization could be to stop the sequential scan as soon as the limit of results has been reached. Am I missing something?
Limit (cost=2229280.06..2229280.08 rows=2 width=8)
-> HashAggregate (cost=2229280.06..2229280.21 rows=15 width=8)
-> Seq Scan on table1 (cost=0.00..2190241.25 rows=15615525 width=8)
Similarly, the following query results in a sequential scan:
select * from table1 where col1 <> col1;
This query is generated by the Sequel library abstraction layer in Ruby when filtering record based on a empty array of values. We fixed this by handling that case on the client side, but originally thought the server would have rewritten it and sent a empty result set.
I would greatly appreciate any help on speeding up these without having to rewrite the queries on the client side.
Thanks,
Francois
From | Date | Subject | |
---|---|---|---|
Next Message | Ants Aasma | 2012-03-28 00:18:24 | Re: Distinct + Limit |
Previous Message | Dave Crooke | 2012-03-27 20:40:32 | Re: Linux machine aggressively clearing cache |