From: | salah jubeh <s_jubeh(at)yahoo(dot)com> |
---|---|
To: | pgsql <pgsql-general(at)postgresql(dot)org> |
Subject: | SELECT DISTINCT |
Date: | 2013-01-17 20:45:32 |
Message-ID: | 1358455532.88146.YahooMailNeo@web122202.mail.ne1.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello Guys,
During my work, I have seen a common practice of using DISTINCT . Some will argue that developer should know the effect of using it, but keep in mind not all developers are gurus in RDBMs. Normally, developers work in a narrow domain. Using DISTINCT might lead to a huge performance degradation because of sort and filter or hashaggregate operations. I think also the rules in determining if the distinct is requiered or not is moderate in complexity.
Example: Please see how much extra cost we have for 119 record
EXPLAIN ANALYZE SELECT DISTINCT * FROM pg_aggregate;
"HashAggregate (cost=3.98..5.17 rows=119 width=28) (actual time=0.525..0.743 rows=119 loops=1)"
" -> Seq Scan on pg_aggregate (cost=0.00..2.19 rows=119 width=28) (actual time=0.011..0.195 rows=119 loops=1)"
"Total runtime: 1.008 ms"
I think any query that returns a unique column (primary key, unique) which is not duplicated in some way (join) can use this optimisation technique.
EXAMPLE:
TABLE A (a1 (uinque), a2, ... , an)
SELECT DISTINCT a1, subset of (a2...an) FROM A; -- will return always a distinct result.
When it comes to joins and nested queries , I do not have clear idea how this can be implemented. But I could do some search.
Regards
From | Date | Subject | |
---|---|---|---|
Next Message | Kirk Wythers | 2013-01-17 20:58:45 | Re: speeding up a join query that utilizes a view |
Previous Message | Paul Jungwirth | 2013-01-17 19:49:37 | Re: Determine if an index is a B-tree, GIST, or something else? |