Re: Best way to aggregate sum for each month

From: Naresh Kumar <narain337(at)gmail(dot)com>
To: Andreas Joseph Krogh <andreas(at)visena(dot)com>
Cc: pgsql-sql <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Best way to aggregate sum for each month
Date: 2015-04-18 00:05:00
Message-ID: CAHuWDLjijcprUDJ1bGAdxPBWLUMdEu71evQa2GQP9tRef2SHfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I wrote it in the form of cte for readability purpose.

If you are aggregating by month date and sure that there will be atleast
one record for every month you don't require to generate the series and
JOIN.

On Fri, Apr 17, 2015 at 6:46 PM, Andreas Joseph Krogh <andreas(at)visena(dot)com>
wrote:

> From what I can see your query performs worse, as CTEs are
> optimisation-fences. Your query needs to scans thru all activity_log
> entries, then do Merge Left Join .
> Maybe there's no way to optimize the Merge Left Join node which takes
> ~300ms?
>
> --
> *Andreas Joseph Krogh*
> CTO / Partner - Visena AS
> Mobile: +47 909 56 963
> andreas(at)visena(dot)com
> ​​
> www.visena.com
> <https://www.visena.com>
>
>
> På lørdag 18. april 2015 kl. 01:29:37, skrev Naresh Kumar <
> narain337(at)gmail(dot)com>:
>
> Try this,
>
>
> ;WITH cte as
> (
> SELECT date_trunc('month', log.start_date::timestamp without time zone) AS
> log_date, sum(log.duration) / (3600 * 1000)::NUMERIC as total_duration
> FROM activity_log
> GROUP BY date_trunc('month', log.start_date::timestamp without time zone)
> )
>
> SELECT q.start_date, cte.total_duration
> FROM
> (SELECT cast(generate_series('2014-01-01' :: DATE, '2014-12-01' ::
> DATE, '1 month') AS DATE) as start_date) AS q
> LEFT OUTER JOIN cte ON cte.log_date = q.start_date
>
> On Fri, Apr 17, 2015 at 5:10 PM, Andreas Joseph Krogh <andreas(at)visena(dot)com>
> wrote:
>>
>> Hi all.
>>
>> I'm using PG-9.4.
>>
>> I'm having a table, activity_log, which holds activities with duration
>> for a given date.
>> I'm trying to sum the duration for each month in a given year and am
>> currently doing it like this:
>>
>>
>> select q.start_date
>> , sum(log.duration) / (3600 * 1000)::NUMERIC as total_durationFROM (SELECT cast(generate_series('2014-01-01' :: DATE, '2014-12-01' :: DATE, '1 month') AS DATE) as start_date) AS q
>> LEFT OUTER JOIN activity_log log ON date_trunc('month', log.start_date::timestamp without time zone) = q.start_dateGROUP BY q.start_dateORDER BY q.start_date
>> ;
>>
>> I have the current index defined:
>>
>>
>> create index activity_start_month ON activity_log(date_trunc('month', start_date::timestamp without time zone));
>>
>>
>>
>> Here's the explain plan:
>>
>>
>> QUERY PLAN
>>
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> GroupAggregate (cost=65.27..91102.17 rows=200 width=12) (actual
>> time=179.983..344.517 rows=12 loops=1)
>> Group Key: ((generate_series(('2014-01-01'::date)::timestamp with time
>> zone, ('2014-12-01'::date)::timestamp with time zone, '1
>> mon'::interval))::date)
>> -> Merge Left Join (cost=65.27..82950.79 rows=1629675 width=12)
>> (actual time=163.894..310.011 rows=143551 loops=1)
>> Merge Cond: (((generate_series(('2014-01-01'::date)::timestamp
>> with time zone, ('2014-12-01'::date)::timestamp with time zone, '1
>> mon'::interval))::date) = date_trunc('month'::text,
>> (log.start_date)::timestamp without time zone))
>> -> Sort (cost=64.84..67.34 rows=1000 width=4) (actual
>> time=0.045..0.052 rows=12 loops=1)
>> Sort Key:
>> ((generate_series(('2014-01-01'::date)::timestamp with time zone,
>> ('2014-12-01'::date)::timestamp with time zone, '1 mon'::interval))::date)
>> Sort Method: quicksort Memory: 25kB
>> -> Result (cost=0.00..5.01 rows=1000 width=0) (actual
>> time=0.017..0.031 rows=12 loops=1)
>> -> Materialize (cost=0.42..51097.29 rows=325935 width=12)
>> (actual time=0.032..227.692 rows=323180 loops=1)
>> -> Index Scan using activity_start_month on activity_log
>> log (cost=0.42..50282.45 rows=325935 width=12) (actual time=0.029..185.746
>> rows=323180 loops=1)
>> Planning time: 0.201 ms
>> Execution time: 344.645 ms
>> (12 rows)
>>
>>
>> Are there any ways to improve this?
>>
>> Thanks.
>>
>> --
>> *Andreas Joseph Krogh*
>> CTO / Partner - Visena AS
>> Mobile: +47 909 56 963
>> andreas(at)visena(dot)com
>> www.visena.com
>> <https://www.visena.com>
>>
>
>
>
> --
> *Andreas Joseph Krogh*
> CTO / Partner - Visena AS
> Mobile: +47 909 56 963
> andreas(at)visena(dot)com
> www.visena.com
> <https://www.visena.com>
>
>

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Suresh Raja 2015-04-18 05:30:51 function to send email with query results
Previous Message David G. Johnston 2015-04-17 23:55:31 Re: Best way to aggregate sum for each month