Re: need help with some aggregation magic

From: Edgardo Portal <egportal2002(at)yahoo(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: need help with some aggregation magic
Date: 2011-06-09 21:22:57
Message-ID: isrdjh$ugu$1@dont-email.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On 2011-06-09, Andreas <maps(dot)on(at)gmx(dot)net> wrote:
> Am 09.06.2011 18:20, schrieb Richard Broersma:
>> On Thu, Jun 9, 2011 at 6:43 AM, Andreas<maps(dot)on(at)gmx(dot)net> wrote:
>>
>>> I have a log-table that stores events of users and projects like this
>>> ( user_id integer, project_id integer, ts timestamp, event_type integer )
>>>
>>> I need an aggregated list of worktime per user, per project, per day.
>>>
>>> The users can switch projects during the day so I can't work this out with
>>> min(ts) and max(ts).
>> SELECT user_id, project_id, date_trunc( 'day', ts ) as event_day,
>> MIN( ts ) AS event_start, MAX( ts ) AS event_end,
>> MAX( ts ) - MIN( ts ) AS duration
>> FROM Loggingtable
>> GROUP BY user_id, project_id, date_trunc( 'day', ts )
>> ORDER BY date_trunc( 'day', ts ), user_id, project_id;
>>
> As far as I understand you calculate the duration as the difference
> between the first and last event of a project per day.
> There is a problem because a user can work from 08.00 to 10.00 on
> project 1 and then from 10.00 to 12.00 on project 2 and then from 12.00
> to 16.00 on project 1 again.
> Then I get project 1 8 hours plus project 2 2 hours though the
> user actually was just 8 hours there.

Unclear to me what to do with the last event of the day (i.e. is there
a implicit end-of-work time to use, or does your usage pattern
guarantee a "closing event" to always be present?), but what about
something like:

BEGIN ;

CREATE TABLE upes (
uid integer
,pid integer
,ts timestamp without time zone
,evtype integer
) ;

-- ...user 100 works on 3 projects Monday
INSERT INTO upes VALUES(100, 11, '2011-06-06 13:00', 1) ;
INSERT INTO upes VALUES(100, 11, '2011-06-06 13:30', 2) ;
INSERT INTO upes VALUES(100, 22, '2011-06-06 13:45', 1) ;
INSERT INTO upes VALUES(100, 33, '2011-06-06 18:00', 1) ;

-- ...user 100 works on 2 projects Tuesday
INSERT INTO upes VALUES(100, 11, '2011-06-07 13:00', 1) ;
INSERT INTO upes VALUES(100, 33, '2011-06-07 13:30', 2) ;
INSERT INTO upes VALUES(100, 33, '2011-06-07 17:45', 1) ;

-- ...user 200 works also works on Tuesday
INSERT INTO upes VALUES(200, 11, '2011-06-07 13:00', 1) ;
INSERT INTO upes VALUES(200, 33, '2011-06-07 13:30', 2) ;
INSERT INTO upes VALUES(200, 33, '2011-06-07 13:45', 4) ;
INSERT INTO upes VALUES(200, 33, '2011-06-07 19:45', 8) ;

-- ...attempt to summarize
WITH ius AS (
SELECT upes.ts AS uts, upes.uid, upes.pid, upes.evtype, upes_next.ts AS nts
FROM upes
LEFT JOIN upes upes_next
ON upes_next.ts = (
SELECT min(utmp.ts)
FROM upes utmp
WHERE utmp.uid=upes.uid
AND utmp.ts > upes.ts
AND utmp.ts::date=upes.ts::date
)
)
SELECT ius.uts::date, ius.uid, ius.pid, SUM(nts-uts)
FROM ius
GROUP BY 1,2,3
ORDER BY 1,2,3
;

ROLLBACK ;

with results of:
uts | uid | pid | sum
------------+-----+-----+----------
2011-06-06 | 100 | 11 | 00:45:00
2011-06-06 | 100 | 22 | 04:15:00
2011-06-06 | 100 | 33 |
2011-06-07 | 100 | 11 | 01:00:00
2011-06-07 | 100 | 33 | 04:15:00
2011-06-07 | 200 | 11 | 01:00:00
2011-06-07 | 200 | 33 | 06:15:00
(7 rows)

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Kevin Crain 2011-06-09 21:39:01 Re: need help with some aggregation magic
Previous Message Andreas 2011-06-09 16:55:38 Re: need help with some aggregation magic