Re: Best way to aggregate sum for each month

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Andreas Joseph Krogh <andreas(at)visena(dot)com>
Cc: "pgsql-sql(at)postgresql(dot)org" <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Best way to aggregate sum for each month
Date: 2015-04-17 23:55:31
Message-ID: CAKFQuwaHyLjqTxbrju=vMFxDnSb=5oypORZUkJeE8=frMBshqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Friday, April 17, 2015, Andreas Joseph Krogh <andreas(at)visena(dot)com> wrote:

> Hi all.
>
> I'm using PG-9.4.
>
> I'm having a table, activity_log, which holds activities with duration for
> a given date.
> I'm trying to sum the duration for each month in a given year and am
> currently doing it like this:
>
>
> select q.start_date
> , sum(log.duration) / (3600 * 1000)::NUMERIC as total_durationFROM (SELECT cast(generate_series('2014-01-01' :: DATE, '2014-12-01' :: DATE, '1 month') AS DATE) as start_date) AS q
> LEFT OUTER JOIN activity_log log ON date_trunc('month', log.start_date::timestamp without time zone) = q.start_dateGROUP BY q.start_dateORDER BY q.start_date
> ;
>
> I have the current index defined:
>
>
> create index activity_start_month ON activity_log(date_trunc('month', start_date::timestamp without time zone));
>
>
>
> Here's the explain plan:
>
>
> QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> GroupAggregate (cost=65.27..91102.17 rows=200 width=12) (actual
> time=179.983..344.517 rows=12 loops=1)
> Group Key: ((generate_series(('2014-01-01'::date)::timestamp with time
> zone, ('2014-12-01'::date)::timestamp with time zone, '1
> mon'::interval))::date)
> -> Merge Left Join (cost=65.27..82950.79 rows=1629675 width=12)
> (actual time=163.894..310.011 rows=143551 loops=1)
> Merge Cond: (((generate_series(('2014-01-01'::date)::timestamp
> with time zone, ('2014-12-01'::date)::timestamp with time zone, '1
> mon'::interval))::date) = date_trunc('month'::text,
> (log.start_date)::timestamp without time zone))
> -> Sort (cost=64.84..67.34 rows=1000 width=4) (actual
> time=0.045..0.052 rows=12 loops=1)
> Sort Key: ((generate_series(('2014-01-01'::date)::timestamp
> with time zone, ('2014-12-01'::date)::timestamp with time zone, '1
> mon'::interval))::date)
> Sort Method: quicksort Memory: 25kB
> -> Result (cost=0.00..5.01 rows=1000 width=0) (actual
> time=0.017..0.031 rows=12 loops=1)
> -> Materialize (cost=0.42..51097.29 rows=325935 width=12)
> (actual time=0.032..227.692 rows=323180 loops=1)
> -> Index Scan using activity_start_month on activity_log
> log (cost=0.42..50282.45 rows=325935 width=12) (actual time=0.029..185.746
> rows=323180 loops=1)
> Planning time: 0.201 ms
> Execution time: 344.645 ms
> (12 rows)
>
>
> Are there any ways to improve this?
>
>
I would usually turn activity_log log into (select dt, sum(...) from
activity_log group by dt) log

And then left join with coalesce to build the final output.

Work_mem adjustment may help too.

David J.

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Naresh Kumar 2015-04-18 00:05:00 Re: Best way to aggregate sum for each month
Previous Message Andreas Joseph Krogh 2015-04-17 23:46:50 Re: Best way to aggregate sum for each month