Quick Links

Re: select count(distinct ...) is slower than select distinct in about 5x

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	jacket41142 <jacket41142(at)gmail(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: select count(distinct ...) is slower than select distinct in about 5x
Date:	2013-12-10 18:19:46
Message-ID:	CAMkU=1wFdvb76NWVw=zLKF3H6-tn=+z9o_Nv2oioWfy6Z8Xo0A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Tue, Dec 10, 2013 at 9:28 AM, jacket41142 <jacket41142(at)gmail(dot)com> wrote:

>
> test=> select distinct col_int from t1 group by col_int;
> Time: 1177.936 ms
>
> So the performance difference is not very large.
> But when I do that:
>
> test=> select count(distinct col_int) from t1;
> count
> -------
> 1025
> (1 row)
>
> Time: 7367.476 ms
>

count(distinct ...) always sorts, rather than using a hash, to do its work.
I don't think that there is any fundamental reason that it could not be
changed to allow it to use hashing, it just hasn't been done yet. It is
complicated by the fact that you can have multiple count() expressions in
the same query which demand sorting/grouping on different columns.

Cheers,

Jeff

In response to

select count(distinct ...) is slower than select distinct in about 5x at 2013-12-10 17:28:22 from jacket41142

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Merlin Moncure	2013-12-10 20:03:12	Re: Parallel Select query performance and shared buffers
Previous Message	Kevin Grittner	2013-12-10 18:16:40	Re: select count(distinct ...) is slower than select distinct in about 5x