Re: DATE_TRUNC() and GROUP BY?

From: Dave Johansen <davejohansen(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: DATE_TRUNC() and GROUP BY?
Date: 2014-01-02 19:47:42
Message-ID: CAAcYxUe0QhTpepPH6ZWQTm-0kUY+WY2t_5HHSqDSuGHSo2ORMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Jan 2, 2014 at 12:36 PM, Dave Johansen <davejohansen(at)gmail(dot)com>wrote:

> On Thu, Jan 2, 2014 at 11:39 AM, Dave Johansen <davejohansen(at)gmail(dot)com>wrote:
>
>> On Fri, Dec 20, 2013 at 10:46 PM, David Rowley <dgrowleyml(at)gmail(dot)com>wrote:
>>
>>> On Fri, Dec 20, 2013 at 1:35 PM, Dave Johansen <davejohansen(at)gmail(dot)com>wrote:
>>>
>>>> I just ran into an interesting issue on Postgres 8.4. I have a database
>>>> with about 3 months of data and when I do following query:
>>>> SELECT DATE_TRUNC('day', time) AS time_t, COUNT(*) FROM mytable GROUP
>>>> BY time_t;
>>>>
>>>> EXPLAIN shows that it's doing a sort and then a GroupAggregate. There
>>>> will only be ~90 outputs, so is there a way I can hint/force the planner to
>>>> just do a HashAggregate?
>>>>
>>>> Just to see if it would change the plan, I tried increasing the
>>>> work_mem up to 1GB and it still did the same plan.
>>>>
>>>>
>>> PostgreSQL does not really have any stats on the selectivity of
>>> date_trunc('day', time) so my guess is that it can only assume that it has
>>> the same selectivity as the time column by itself... Which is very untrue
>>> in this case.
>>> The group aggregate plan is chosen here as PostgreSQL thinks the the
>>> hash table is going to end up pretty big and decides that the group
>>> aggregate will be the cheaper option.
>>>
>>> I mocked up your data and on 9.4 I can get the hash aggregate plan to
>>> run if I set the n_distinct value to 90 then analyze the table again.. Even
>>> if you could do this on 8.4 I'd not recommend it as it will probably cause
>>> havoc with other plans around the time column. I did also get the hash
>>> aggregate plan to run if I created a functional index on date_trunc('day',
>>> time) then ran analyze again. I don't have a copy of 8.4 around to see if
>>> the planner will make use of the index in the same way.
>>>
>>
> I just tried this on 8.4 and it won't create the index because
> DATE_TRUNC() is not immutable. The exact error is:
> ERROR: function in index expression must be marked IMMUTABLE
>
> Any suggestions or other ideas?
>

I apologize for the multiple emails, but I just looked at the definition of
DATE_TRUNC() and for TIMESTAMP WITHOUT TIME ZONE it's IMMUTABLE, so I will
look into switching to that and see if using the index speeds up the
queries.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Thomas Mayer 2014-01-02 21:32:01 window function induces full table scan
Previous Message Dave Johansen 2014-01-02 19:36:34 Re: DATE_TRUNC() and GROUP BY?