Quick Links

Re: select count(distinct ...) is slower than select distinct in about 5x

From:	jacket41142 <jacket41142(at)gmail(dot)com>
To:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: select count(distinct ...) is slower than select distinct in about 5x
Date:	2013-12-11 01:23:03
Message-ID:	CAONnt+5Atj_XvLtzAGGnoUN7_hAqoV-5+cahqkcuXn7VhtYTqg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Thanks very much.

I think another problem is that the cost estimation isn't good enough to
reflex real cost. Since we can see, from "explain analyze ...",
count(distinct ...) has smallest cost between the others, but since it uses
sorts, the time complexity should be higher especially for large amount of
rows.

Also I think even if we can have multiple count() expressions, the
optimizer should also be able to choose between use sort, HashAggregate or
maybe something like linear aggregate if sorts are not needed or other
methods if exist. Also this may be done as just one job for entire table of
interested columns, or for each column separately.

regards,
jacket41142

2013/12/11 Jeff Janes <jeff(dot)janes(at)gmail(dot)com>

> On Tue, Dec 10, 2013 at 9:28 AM, jacket41142 <jacket41142(at)gmail(dot)com>wrote:
>
>
>>
>> test=> select distinct col_int from t1 group by col_int;
>> Time: 1177.936 ms
>>
>> So the performance difference is not very large.
>> But when I do that:
>>
>> test=> select count(distinct col_int) from t1;
>> count
>> -------
>> 1025
>> (1 row)
>>
>> Time: 7367.476 ms
>>
>
>
> count(distinct ...) always sorts, rather than using a hash, to do its
> work. I don't think that there is any fundamental reason that it could not
> be changed to allow it to use hashing, it just hasn't been done yet. It is
> complicated by the fact that you can have multiple count() expressions in
> the same query which demand sorting/grouping on different columns.
>
> Cheers,
>
> Jeff
>

Responses

Re: select count(distinct ...) is slower than select distinct in about 5x at 2013-12-11 01:57:15 from jacket41142
Re: select count(distinct ...) is slower than select distinct in about 5x at 2013-12-11 02:24:57 from Jeff Janes

Browse pgsql-performance by date

	From	Date	Subject
Next Message	jacket41142	2013-12-11 01:57:15	Re: select count(distinct ...) is slower than select distinct in about 5x
Previous Message	Krzysztof Olszewski	2013-12-10 23:30:48	Problem with slow query with WHERE conditions with OR clause on primary keys