Re: select count(distinct ...) is slower than select distinct in about 5x

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: jacket41142 <jacket41142(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: select count(distinct ...) is slower than select distinct in about 5x
Date: 2013-12-11 02:24:57
Message-ID: CAMkU=1wgFrZyHpUtgunUJu9Z5X+FC0Gd6La1Uc5v0AQS65=cfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tuesday, December 10, 2013, jacket41142 wrote:

> Thanks very much.
>
> I think another problem is that the cost estimation isn't good enough to
> reflex real cost. Since we can see, from "explain analyze ...",
> count(distinct ...) has smallest cost between the others, but since it uses
> sorts, the time complexity should be higher especially for large amount of
> rows.
>

That one is easy to explain. The cost estimate is not intended to be an
absolute estimate, it is just an relative estimate to choose between
alternatives. Since the current implementation of count(distinct ...)
does not *have* any alternatives for that step in the process, there is no
point in estimating a cost for it. So part of providing it with
alternatives will have to be providing those cost estimates as well.

> Also I think even if we can have multiple count() expressions, the
> optimizer should also be able to choose between use sort, HashAggregate or
> maybe something like linear aggregate if sorts are not needed or other
> methods if exist. Also this may be done as just one job for entire table of
> interested columns, or for each column separately.
>

Right. I hope this gets fixed. It's been on my todo list for a while, but
at the current rate of going through my todo list, it will takes a few
decades to get to if it is left up to me....

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2013-12-11 04:54:36 Re: Debugging shared memory issues on CentOS
Previous Message jacket41142 2013-12-11 01:57:15 Re: select count(distinct ...) is slower than select distinct in about 5x