Re: Fwd: Slow Count-Distinct Query

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Varadharajan Mukundan <srinathsmn(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Fwd: Slow Count-Distinct Query
Date: 2014-04-04 17:59:03
Message-ID: CAMkU=1y9RqS7E7xsVkEqrhJ8ZtzDu1Cj6uwC4p7uWpZ5c5U-JQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Apr 4, 2014 at 2:31 AM, Varadharajan Mukundan
<srinathsmn(at)gmail(dot)com>wrote:

> Sorry that i just joined the list and have to break the thread to reply to
> Tom Lane's response on this @
> http://www.postgresql.org/message-id/13741.1396275339@sss.pgh.pa.us
>
>
> Note that the indexscan is actually *slower* than the seqscan so far as
>> the table access is concerned; if the table were big enough to not fit
>> in RAM, this would get very much worse. So I'm not impressed with trying
>> to force the optimizer's hand as you've done here --- it might be a nice
>> plan now, but it's brittle. See if a bigger work_mem improves matters
>> enough with the regular plan.
>
>
> I agree to the point that hand tuning optimiser is brittle and something
> that should not be done. But the reason to that was to force the index-only
> scan (Not the index scan). I feel Index-only scan would speed up given
> postgres is row oriented and we are running count-distinct on a column in a
> table with lot of columns (Say 6-7 in number). I think that is what have
> contributed to the gain in performance.
>

It looks like the original emailer wrote a query that the planner is not
smart enough to plan properly (A known limitation of that kind of query).
He then made a bunch of changes, none of which worked. He then re-wrote
the query into a form for which the planner does a better job on. What we
do not know is, what would happen if he undoes all of those other changes
and *just* uses the new form of the query?

>
> I did a similar test with around 2 million tuples with work_mem = 250 MB
> and got the query to respond with 2x speed up. But the speed-up got with
> index-only scan was huge and response was in sub-seconds whereas with
> work_mem the response was couple of seconds.
>

This change is almost certainly due to the change from a sort to a hash
aggregate, and nothing to do with the index-only scan at all.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message PARIS Nicolas 2014-04-04 19:49:12 Re: PGSQL 9.3 - Materialized View - multithreading
Previous Message Thom Brown 2014-04-04 16:54:07 Re: PGSQL 9.3 - Materialized View - multithreading