Summing activity intervals without any obvious column to group by

From: Carey Tilden <carey(dot)tilden(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Summing activity intervals without any obvious column to group by
Date: 2012-08-14 00:28:24
Message-ID: CAEwswh-Eu=vcprDxDd-n1m871u6Cdk+0i14iVuXoOYPfk09Qqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Apologies for the awkward title. I haven't quite thought of the right way
to describe my problem, which may be why I've had a hard time figuring out
how to solve it. I have a list of program start/stop times, and I want to
know how long each run takes to complete. The thing that's really tripping
me up is there are gaps in the sequence. I've figured out how to collapse
the results down to a single row per attempt, but I can't quite figure out
how to further collapse down each full run to its own row. It'd be easy if
I had a session_id or something to group on, but I don't. All I have are
the start/stop times.

Here's some sample data. Hopefully this clarifies what I'm talking about:

drop table if exists program_runs;

create temporary table program_runs (
id serial,
time_stamp timestamptz,
action text
);

insert into program_runs (time_stamp, action) values
('2012-01-01 10:00:00 PST', 'started'), ('2012-01-01 10:10:00 PST',
'stopped early'),
('2012-01-01 10:20:00 PST', 'started'), ('2012-01-01 10:30:00 PST',
'stopped early'),
('2012-01-01 10:40:00 PST', 'started'), ('2012-01-01 10:47:00 PST',
'completed'),
('2012-01-01 10:50:00 PST', 'started'), ('2012-01-01 11:00:00 PST',
'stopped early'),
('2012-01-01 11:10:00 PST', 'started'), ('2012-01-01 11:13:00 PST',
'completed'),
('2012-01-01 11:20:00 PST', 'started'), ('2012-01-01 11:30:00 PST',
'stopped early'),
('2012-01-01 11:40:00 PST', 'started'), ('2012-01-01 11:50:00 PST',
'stopped early'),
('2012-01-01 12:00:00 PST', 'started'), ('2012-01-01 12:10:00 PST',
'stopped early'),
('2012-01-01 12:20:00 PST', 'started'), ('2012-01-01 12:29:00 PST',
'completed');

select
this_time_stamp as starting_time_stamp,
next_time_stamp - this_time_stamp as time_elapsed,
next_action as closing_action
from (
select
time_stamp as this_time_stamp, lead(time_stamp) over (order by
id) as next_time_stamp,
action as this_action, lead(action) over (order by id) as
next_action,
id as this_id, lead(id) over (order by id) as next_id
from program_runs
) q
where this_action = 'started';

Note that each run has a pair of entries in the table. The first is always
"started", but the second may be either "stopped early" or "completed".
The final results I'd like to see are:

starting_time_stamp | total_time_elapsed
------------------------+--------------------
2012-01-01 10:00:00-08 | 00:27:00
2012-01-01 10:50:00-08 | 00:13:00
2012-01-01 11:20:00-08 | 00:39:00

Hope that's enough detail. Any ideas or suggestions gladly accepted!

Regards,
Carey

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andrew Sullivan 2012-08-14 00:53:14 Re: Summing activity intervals without any obvious column to group by
Previous Message dud 2012-08-13 23:29:23 Retrieving the start and end hour queries correctly