From: | Luke Lonergan <llonergan(at)greenplum(dot)com> |
---|---|
To: | Francisco Reyes <lists(at)stringsutils(dot)com>, Pgsql performance <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Group by more efficient than distinct? |
Date: | 2008-04-21 05:35:58 |
Message-ID: | C43175CE.5B957%llonergan@greenplum.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi Francisco,
Generally, PG sorting is much slower than hash aggregation for performing
the distinct operation. There may be small sizes where this isn¹t true, but
for large amounts of data (in-memory or not), hash agg (used most often, but
not always by GROUP BY) is faster.
We¹ve implemented a special optimization to PG sorting that does the
distinct processing within the sort, instead of afterward, but it¹s limited
to some small-ish number (10,000) of distinct values due to it¹s use of a
memory and processing intensive heap.
So, you¹re better off using GROUP BY and making sure that the planner is
using hash agg to do the work.
- Luke
On 4/17/08 8:46 PM, "Francisco Reyes" <lists(at)stringsutils(dot)com> wrote:
> I am trying to get a distinct set of rows from 2 tables.
> After looking at someone else's query I noticed they were doing a group by
> to obtain the unique list.
>
> After comparing on multiple machines with several tables, it seems using
> group by to obtain a distinct list is substantially faster than using
> select distinct.
>
> Is there any dissadvantage of using "group by" to obtain a unique list?
>
> On a small dataset the difference was about 20% percent.
>
> Group by
> HashAggregate (cost=369.61..381.12 rows=1151 width=8) (actual
> time=76.641..85.167 rows=2890 loops=1)
>
> Distinct
> Unique (cost=1088.23..1174.53 rows=1151 width=8) (actual
> time=90.516..140.123 rows=2890 loops=1)
>
> Although I don't have the numbers here with me, a simmilar result was
> obtaining against a query that would return 100,000 rows. 20% and more
> speed differnce between "group by" over "select distinct".
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
From | Date | Subject | |
---|---|---|---|
Next Message | Adrian Moisey | 2008-04-21 09:50:31 | connections slowing everything down? |
Previous Message | Tom Lane | 2008-04-21 01:30:50 | Re: corrupted shared memory message |