Re: DISTINCT vs. GROUP BY

From: Thom Brown <thombrown(at)gmail(dot)com>
To: Dimi Paun <dimi(at)lattica(dot)com>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: DISTINCT vs. GROUP BY
Date: 2010-02-09 22:22:17
Message-ID: bddc86151002091422k7e401785t2407f6227643d1dd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 9 February 2010 21:46, Dimi Paun <dimi(at)lattica(dot)com> wrote:
> >From what I've read on the net, these should be very similar,
> and should generate equivalent plans, in such cases:
>
> SELECT DISTINCT x FROM mytable
> SELECT x FROM mytable GROUP BY x
>
> However, in my case (postgresql-server-8.1.18-2.el5_4.1),
> they generated different results with quite different
> execution times (73ms vs 40ms for DISTINCT and GROUP BY
> respectively):
>
> tts_server_db=# EXPLAIN ANALYZE select userdata from tagrecord where clientRmaInId = 'CPC-RMA-00110' group by userdata;
>                                                                 QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------
>  HashAggregate  (cost=775.68..775.69 rows=1 width=146) (actual time=40.058..40.058 rows=0 loops=1)
>   ->  Bitmap Heap Scan on tagrecord  (cost=4.00..774.96 rows=286 width=146) (actual time=40.055..40.055 rows=0 loops=1)
>         Recheck Cond: ((clientrmainid)::text = 'CPC-RMA-00110'::text)
>         ->  Bitmap Index Scan on idx_tagdata_clientrmainid  (cost=0.00..4.00 rows=286 width=0) (actual time=40.050..40.050 rows=0 loops=1)
>               Index Cond: ((clientrmainid)::text = 'CPC-RMA-00110'::text)
>  Total runtime: 40.121 ms
>
> tts_server_db=# EXPLAIN ANALYZE select distinct userdata from tagrecord where clientRmaInId = 'CPC-RMA-00109';
>                                                                    QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------------
>  Unique  (cost=786.63..788.06 rows=1 width=146) (actual time=73.018..73.018 rows=0 loops=1)
>   ->  Sort  (cost=786.63..787.34 rows=286 width=146) (actual time=73.016..73.016 rows=0 loops=1)
>         Sort Key: userdata
>         ->  Bitmap Heap Scan on tagrecord  (cost=4.00..774.96 rows=286 width=146) (actual time=72.940..72.940 rows=0 loops=1)
>               Recheck Cond: ((clientrmainid)::text = 'CPC-RMA-00109'::text)
>               ->  Bitmap Index Scan on idx_tagdata_clientrmainid  (cost=0.00..4.00 rows=286 width=0) (actual time=72.936..72.936 rows=0 loops=1)
>                     Index Cond: ((clientrmainid)::text = 'CPC-RMA-00109'::text)
>  Total runtime: 73.144 ms
>
> What gives?
>
Firstly, the 2 queries aren't equal. They're matching against
different clientrmainid values.

Also, look at the bitmap index scan for each:

Bitmap Index Scan on idx_tagdata_clientrmainid (cost=0.00..4.00
rows=286 width=0) (actual time=40.050..40.050 rows=0 loops=1)

Bitmap Index Scan on idx_tagdata_clientrmainid  (cost=0.00..4.00
rows=286 width=0) (actual time=72.936..72.936 rows=0 loops=1)

That's where the difference is. An identical scan takes longer in one
than the other, either due to the index scan looking for different
values in each case, or at the time you were running it, another
process was using more resources. You'd have to run these several
times to get an idea of average times.

Have you run ANALYZE on the table beforehand to make sure your stats
are up to date?

Regards

Thom

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2010-02-09 22:38:21 Re: DISTINCT vs. GROUP BY
Previous Message Dimi Paun 2010-02-09 21:46:16 DISTINCT vs. GROUP BY