Re: Summing activity intervals without any obvious column to group by

From: Carey Tilden <carey(dot)tilden(at)gmail(dot)com>
To: David Johnston <polobo(at)yahoo(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Summing activity intervals without any obvious column to group by
Date: 2012-08-14 03:59:25
Message-ID: CAEwswh_3RHBEetnpUrFWm_jsimZLEQvpXFhkbz3yiyXbrxCGfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Aug 13, 2012 at 7:05 PM, David Johnston <polobo(at)yahoo(dot)com> wrote:

> On Aug 13, 2012, at 21:22, Carey Tilden <carey(dot)tilden(at)gmail(dot)com> wrote:
>
> On Mon, Aug 13, 2012 at 6:01 PM, David Johnston <polobo(at)yahoo(dot)com> wrote:
>
>> On Aug 13, 2012, at 20:28, Carey Tilden <carey(dot)tilden(at)gmail(dot)com> wrote:
>>
>> > Apologies for the awkward title. I haven't quite thought of the right
>> way to describe my problem, which may be why I've had a hard time figuring
>> out how to solve it. I have a list of program start/stop times, and I want
>> to know how long each run takes to complete. The thing that's really
>> tripping me up is there are gaps in the sequence. I've figured out how to
>> collapse the results down to a single row per attempt, but I can't quite
>> figure out how to further collapse down each full run to its own row. It'd
>> be easy if I had a session_id or something to group on, but I don't. All I
>> have are the start/stop times.
>> >
>> > Here's some sample data. Hopefully this clarifies what I'm talking
>> about:
>> >
>> > drop table if exists program_runs;
>> >
>> > create temporary table program_runs (
>> > id serial,
>> > time_stamp timestamptz,
>> > action text
>> > );
>> >
>> > insert into program_runs (time_stamp, action) values
>> > ('2012-01-01 10:00:00 PST', 'started'), ('2012-01-01 10:10:00
>> PST', 'stopped early'),
>> > ('2012-01-01 10:20:00 PST', 'started'), ('2012-01-01 10:30:00
>> PST', 'stopped early'),
>> > ('2012-01-01 10:40:00 PST', 'started'), ('2012-01-01 10:47:00
>> PST', 'completed'),
>> > ('2012-01-01 10:50:00 PST', 'started'), ('2012-01-01 11:00:00
>> PST', 'stopped early'),
>> > ('2012-01-01 11:10:00 PST', 'started'), ('2012-01-01 11:13:00
>> PST', 'completed'),
>> > ('2012-01-01 11:20:00 PST', 'started'), ('2012-01-01 11:30:00
>> PST', 'stopped early'),
>> > ('2012-01-01 11:40:00 PST', 'started'), ('2012-01-01 11:50:00
>> PST', 'stopped early'),
>> > ('2012-01-01 12:00:00 PST', 'started'), ('2012-01-01 12:10:00
>> PST', 'stopped early'),
>> > ('2012-01-01 12:20:00 PST', 'started'), ('2012-01-01 12:29:00
>> PST', 'completed');
>> >
>> > select
>> > this_time_stamp as starting_time_stamp,
>> > next_time_stamp - this_time_stamp as time_elapsed,
>> > next_action as closing_action
>> > from (
>> > select
>> > time_stamp as this_time_stamp, lead(time_stamp) over (order
>> by id) as next_time_stamp,
>> > action as this_action, lead(action) over (order by id) as
>> next_action,
>> > id as this_id, lead(id) over (order by id) as next_id
>> > from program_runs
>> > ) q
>> > where this_action = 'started';
>> >
>> > Note that each run has a pair of entries in the table. The first is
>> always "started", but the second may be either "stopped early" or
>> "completed". The final results I'd like to see are:
>> >
>> > starting_time_stamp | total_time_elapsed
>> > ------------------------+--------------------
>> > 2012-01-01 10:00:00-08 | 00:27:00
>> > 2012-01-01 10:50:00-08 | 00:13:00
>> > 2012-01-01 11:20:00-08 | 00:39:00
>> >
>> > Hope that's enough detail. Any ideas or suggestions gladly accepted!
>> >
>> > Regards,
>> > Carey
>>
>> First artificially generate row (pair) identifiers by integer dividing
>> the ordered row number by 2.
>>
>> Using window or sub-queries identify the bookends for each group (i.e.,
>> the identifier for each completed and the prior completed). Give these
>> groups artificial session identifiers/row numbers.
>>
>> Assign the artificial session id to each transaction row by using the
>> bookends.
>>
>
> This is the part where I draw a blank. How would I do that? Seems like
> it should be easy with window functions, but I just can't think of the way
> to do it.
>
>
> With detail as ()
> , bookmarks as ()
> Select detail.id, bookmarks.id, ...
> From detail
> Join bookmarks on ( detail.id between bookmarks.startid and
> bookmarks.endid )
>
> Bookmarks as (
> Select row_number() over () as id,
> detail.id, coalesce(min(detail.id) over (rows 1 preceeding),0) --coalesce
> for the first row
> From detail where 'completed'
> )
>
> Probably need to play with row ordering but this should get you started.
>

Thanks much! That finally clicked and I now have the results I was after.
I included my latest code as an attachment, since pasting it inline seemed
a bit much. I'm all ears if anyone wants to take a look and suggest any
further refinements.

Cheers,
Carey

>
>> Now you have identifiers with which to group.
>>
>> This makes a number of assumptions regarding the form of the input data.
>> It will solve for your example data but it may not generalize. In
>> particular it assumes non-overlapping sessions.
>>
>
> The assumptions hold fairly well. Sessions do not overlap, thankfully.
> There are different program runs to untangle, but that's simple enough
> (order by program_name, time_stamp).
>
> Thanks for the suggestions so far!
>
> Carey
>
>

Attachment Content-Type Size
program_runs_testbed.sql application/octet-stream 2.6 KB

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Rajeev rastogi 2012-08-14 04:58:13 Regarding pc-lint on PostgreSQL code
Previous Message David Johnston 2012-08-14 02:05:06 Re: Summing activity intervals without any obvious column to group by