Re: Group by range in hour of day

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Israel Brewster <israel(at)ravnalaska(dot)net>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org general" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Group by range in hour of day
Date: 2015-03-16 23:46:33
Message-ID: 55076B59.2000600@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 03/16/2015 04:16 PM, Israel Brewster wrote:
> On Mar 16, 2015, at 2:22 PM, David G. Johnston
> <david(dot)g(dot)johnston(at)gmail(dot)com <mailto:david(dot)g(dot)johnston(at)gmail(dot)com>> wrote:
>>
>> On Mon, Mar 16, 2015 at 3:13 PM, Adrian Klaver
>> <adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>>wrote:
>>
>> On 03/16/2015 02:57 PM, Israel Brewster wrote:
>>
>> I have a table with two timestamp columns for the start time
>> and end
>> time of each record (call them start and end).I'm trying to
>> figure out
>> if there is a way to group these records by "hour of day",
>> that is the
>> record should be included in the group if the hour of the day
>> for the
>> group falls anywhere in the range [start,end]. Obviously each
>> record may
>> well fall into multiple groups under this scenario.
>>
>> The goal here is to figure out, for each hour of the day, a)
>> what is the
>> total number of "active" records for that hour, and b) what is
>> the total
>> "active" time for those records during the hour, with an
>> ultimate goal
>> of figuring out the average active time per record per hour.
>>
>> So, for simplified example, if the table contained three records:
>>
>> start | end
>> ------------------------------__-----------------------
>> 2015-03-15 08:15 | 2015-03-15 10:45
>> 2015-03-15 09:30 | 2015-03-15 10:15
>> 2015-03-15 10:30 | 2015-03-15 11:30
>>
>>
>> Then the results should break out something like this:
>>
>> hour | count | sum
>> -----------------------------
>> 8 | 1 | 0.75
>> 9 | 2 | 1.5
>> 10 | 3 | 1.5
>> 11 | 1 | 0.5
>>
>> I can then easily manipulate these values to get my ultimate
>> goal of the
>> average, which would of course always be less than or equal to
>> 1. Is
>> this doable in postgress? Or would it be a better idea to
>> simply pull
>> the raw data and post-process in code? Thanks.
>>
>>
>> Do not have an answer for you, but a question:
>>
>> What version of Postgres are you on?
>>
>> This will help determine what tools are available to work with.
>>
>>
>> ​The following will give you endpoints for your bounds. Version is
>> important since "range types" could be very useful in this situation -
>> but you'd still need to generate the bounds info regardless.​
>>
>> ​
>> SELECT *
>> FROM
>> (SELECT * FROM generate_series('2015-03-15'::timestamptz,
>> '2015-03-16'::timestamptz, '1 hour'::interval) start (start_ts)) AS s
>> CROSS JOIN
>> (SELECT end_ts + '1 hour'::interval AS end_ts FROM
>> generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz,
>> '1 hour'::interval) e (end_ts)) AS e
>>
>> You would join this using an ON condition with an OR (start BETWEEN
>> [...] OR end BETWEEN [...]) - range logic will be better and you may
>> want to adjust the upper bound by negative 1 (nano-second?) to allow
>> for easier "<=" logic if using BETWEEN.
>>
>
> Thanks, that is very helpful, but are you sure CROSS JOIN is what you
> wanted here? using that, I get a 625 row result set where each row from
> the first SELECT is paired up with EVERY row from the second select. I
> would think I would want the first row of the first SELECT paired up
> with only the first row of the second, second row of the first paired
> with the second row of the second, etc - i.e. 24 start and end bounds.
> Or am I missing something?

Given this:

test=> select * from start_end ;
id | start_time | end_time
----+------------------------+------------------------
1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07
2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07

using Pauls hints I got:

test=> select * from start_end, generate_series(0, 23) as s(h) where h
between extract(hour from start_time) and extract(hour from end_time) ;

id | start_time | end_time | h
----+------------------------+------------------------+----
1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 | 8
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 8
1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 | 9
2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 9
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 9
2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 10
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 10
2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 11
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 11

test=> select h, count(*) from start_end, generate_series(0, 23) as s(h)
where h between extract(hour from start_time) and extract(hour from
end_time) group by h order by h;

h | count
----+-------
8 | 2
9 | 3
10 | 2
11 | 2

>
>
>
> -----------------------------------------------
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> -----------------------------------------------
>
>> ​David J.​
>>
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2015-03-16 23:57:06 Re: Group by range in hour of day
Previous Message Israel Brewster 2015-03-16 23:16:20 Re: Group by range in hour of day