Re: Summing activity intervals without any obvious column to group by

From: Carey Tilden <carey(dot)tilden(at)gmail(dot)com>
To: David Johnston <polobo(at)yahoo(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Summing activity intervals without any obvious column to group by
Date: 2012-08-14 01:22:34
Message-ID: CAEwswh96qf1mJ7WB941A2AAA+JR_Jc3qhVQ73kgkYhpwAtDLXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Aug 13, 2012 at 6:01 PM, David Johnston <polobo(at)yahoo(dot)com> wrote:

> On Aug 13, 2012, at 20:28, Carey Tilden <carey(dot)tilden(at)gmail(dot)com> wrote:
>
> > Apologies for the awkward title. I haven't quite thought of the right
> way to describe my problem, which may be why I've had a hard time figuring
> out how to solve it. I have a list of program start/stop times, and I want
> to know how long each run takes to complete. The thing that's really
> tripping me up is there are gaps in the sequence. I've figured out how to
> collapse the results down to a single row per attempt, but I can't quite
> figure out how to further collapse down each full run to its own row. It'd
> be easy if I had a session_id or something to group on, but I don't. All I
> have are the start/stop times.
> >
> > Here's some sample data. Hopefully this clarifies what I'm talking
> about:
> >
> > drop table if exists program_runs;
> >
> > create temporary table program_runs (
> > id serial,
> > time_stamp timestamptz,
> > action text
> > );
> >
> > insert into program_runs (time_stamp, action) values
> > ('2012-01-01 10:00:00 PST', 'started'), ('2012-01-01 10:10:00
> PST', 'stopped early'),
> > ('2012-01-01 10:20:00 PST', 'started'), ('2012-01-01 10:30:00
> PST', 'stopped early'),
> > ('2012-01-01 10:40:00 PST', 'started'), ('2012-01-01 10:47:00
> PST', 'completed'),
> > ('2012-01-01 10:50:00 PST', 'started'), ('2012-01-01 11:00:00
> PST', 'stopped early'),
> > ('2012-01-01 11:10:00 PST', 'started'), ('2012-01-01 11:13:00
> PST', 'completed'),
> > ('2012-01-01 11:20:00 PST', 'started'), ('2012-01-01 11:30:00
> PST', 'stopped early'),
> > ('2012-01-01 11:40:00 PST', 'started'), ('2012-01-01 11:50:00
> PST', 'stopped early'),
> > ('2012-01-01 12:00:00 PST', 'started'), ('2012-01-01 12:10:00
> PST', 'stopped early'),
> > ('2012-01-01 12:20:00 PST', 'started'), ('2012-01-01 12:29:00
> PST', 'completed');
> >
> > select
> > this_time_stamp as starting_time_stamp,
> > next_time_stamp - this_time_stamp as time_elapsed,
> > next_action as closing_action
> > from (
> > select
> > time_stamp as this_time_stamp, lead(time_stamp) over (order
> by id) as next_time_stamp,
> > action as this_action, lead(action) over (order by id) as
> next_action,
> > id as this_id, lead(id) over (order by id) as next_id
> > from program_runs
> > ) q
> > where this_action = 'started';
> >
> > Note that each run has a pair of entries in the table. The first is
> always "started", but the second may be either "stopped early" or
> "completed". The final results I'd like to see are:
> >
> > starting_time_stamp | total_time_elapsed
> > ------------------------+--------------------
> > 2012-01-01 10:00:00-08 | 00:27:00
> > 2012-01-01 10:50:00-08 | 00:13:00
> > 2012-01-01 11:20:00-08 | 00:39:00
> >
> > Hope that's enough detail. Any ideas or suggestions gladly accepted!
> >
> > Regards,
> > Carey
>
> First artificially generate row (pair) identifiers by integer dividing the
> ordered row number by 2.
>
> Using window or sub-queries identify the bookends for each group (i.e.,
> the identifier for each completed and the prior completed). Give these
> groups artificial session identifiers/row numbers.
>
> Assign the artificial session id to each transaction row by using the
> bookends.
>

This is the part where I draw a blank. How would I do that? Seems like it
should be easy with window functions, but I just can't think of the way to
do it.

> Now you have identifiers with which to group.
>
> This makes a number of assumptions regarding the form of the input data.
> It will solve for your example data but it may not generalize. In
> particular it assumes non-overlapping sessions.
>

The assumptions hold fairly well. Sessions do not overlap, thankfully.
There are different program runs to untangle, but that's simple enough
(order by program_name, time_stamp).

Thanks for the suggestions so far!

Carey

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Johnston 2012-08-14 02:05:06 Re: Summing activity intervals without any obvious column to group by
Previous Message David Johnston 2012-08-14 01:01:50 Re: Summing activity intervals without any obvious column to group by