From: | Gregory Stark <stark(at)enterprisedb(dot)com> |
---|---|
To: | "Francisco Reyes" <lists(at)stringsutils(dot)com> |
Cc: | "Pgsql performance" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Group by more efficient than distinct? |
Date: | 2008-04-18 09:36:02 |
Message-ID: | 87ve2f7bel.fsf@oxford.xeocode.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
"Francisco Reyes" <lists(at)stringsutils(dot)com> writes:
> Is there any dissadvantage of using "group by" to obtain a unique list?
>
> On a small dataset the difference was about 20% percent.
>
> Group by
> HashAggregate (cost=369.61..381.12 rows=1151 width=8) (actual
> time=76.641..85.167 rows=2890 loops=1)
HashAggregate needs to store more values in memory at the same time so it's
not a good plan if you have a lot of distinct values.
But the planner knows that and so as long as your work_mem is set to a
reasonable size (keeping in mind each sort or other operation feels free to
use that much itself even if there are several in the query) and the rows
estimate is reasonable accurate -- here it's mediocre but not dangerously bad
-- then if the planner is picking it it's probably a good idea.
I'm not sure but I think there are cases where the DISTINCT method wins too.
This is basically a bug, in an ideal world both queries would generate
precisely the same plans since they're equivalent. It's just not a high
priority since we have so many more interesting improvements competing for
time.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!
From | Date | Subject | |
---|---|---|---|
Next Message | PFC | 2008-04-18 10:35:04 | Re: Group by more efficient than distinct? |
Previous Message | Thomas Pundt | 2008-04-18 08:25:04 | Re: Group by more efficient than distinct? |