From: | Schneider <schneider(at)ardentperf(dot)com> |
---|---|
To: | Khalil Khamlichi <khamlichi(dot)khalil(at)gmail(dot)com> |
Cc: | Melvin Davidson <melvin6925(at)gmail(dot)com>, Clifford Snow <clifford(at)snowandsnow(dot)us>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: time series data |
Date: | 2017-10-03 18:58:03 |
Message-ID: | CA+fnDAbKq2i9d1z5GBCq0p=ZM10J8aLJ6ee4kXvJMHJYEuPJzQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Mon, Oct 2, 2017 at 10:27 AM, Khalil Khamlichi
<khamlichi(dot)khalil(at)gmail(dot)com> wrote:
> we have records like this
>
> ccdb1=# select user_name, agent_status, event_time from cc_events ;
>
> user_name | agent_status | event_time
> -----------+--------------+---------------------
> user1 | ready | 2017-01-01 10:00:00
> user1 | talking | 2017-01-01 10:02:00
> user1 | after_call | 2017-01-01 10:08:00
> user1 | ready | 2017-01-01 10:10:00
> user1 | talking | 2017-01-01 10:12:00
> user1 | after_call | 2017-01-01 10:15:00
> user1 | paused | 2017-01-01 10:17:00
> user1 | ready | 2017-01-01 10:25:00
> user1 | talking | 2017-01-01 10:26:00
> (9 rows)
>
>
> so user1 was READY at 2017-01-01 10:00:00 then he received a call
> that he attended at 2017-01-01 10:02:00 and so on ...
> so user1 was ready for 2 minutes, then he was talking for 6 minutes
> then he was in after_call (doing after call work) for 2 minutes and
> this is the kind of information we want to query.
>
> my solution so far that I came with, is in my table I have 1 more
> field : end_time
> so when an event comes in and before the insert I do :
> update cc_events set end_time = current_timestamp where user_name =
> 'user_of_event' and end_time is null;
>
> then I insert new event leaving the end_time as null so that next
> event will update it and so on.
>
> its working fine, I have the start and end times for each event, its
> not too painful to query (sum(end-start) while grouping by user_name,
> agent_status), but its one more update on the table and also limited
> in what you can query about,
>
> I know this must be a common problem in every software that deals with
> events, so I suppose something is already built-in in postgres to deal
> with it.
Khalil, changing your schema is one solution with certain benefits -
but it's definitely not necessary when you have the power of
PostgreSQL at your fingertips. You can solve your problem without
changing anything at all. :)
All you need is a window function:
https://www.postgresql.org/docs/9.6/static/tutorial-window.html
Here's an example which I just now tested on 9.6.3, 9.5.7, 9.4.12 and
9.3.17 (all the latest minors currently supported on RDS). You can
try these queries on your own system; they should work anywhere.
First, I added a second user to your data set to make sure we were
handling that case correctly.
==========
create table cc_events (user_name varchar(8), agent_status
varchar(20), event_time timestamp);
insert into cc_events values
('user1', 'ready', '2017-01-01 10:00:00'),
('user1', 'talking', '2017-01-01 10:02:00'),
('user2', 'ready', '2017-01-01 10:04:00'),
('user2', 'talking', '2017-01-01 10:05:00'),
('user1', 'after_call', '2017-01-01 10:07:00'),
('user1', 'ready', '2017-01-01 10:08:00'),
('user1', 'talking', '2017-01-01 10:10:00'),
('user1', 'after_call', '2017-01-01 10:15:00'),
('user2', 'after_call', '2017-01-01 10:18:00'),
('user1', 'paused', '2017-01-01 10:20:00'),
('user2', 'paused', '2017-01-01 10:21:00');
select * from cc_events order by user_name, event_time;
==========
here's a basic window function in action:
==========
select user_name, agent_status, event_time,
lead(event_time) over (partition by user_name order by event_time)
next_event_time
from cc_events order by event_time;
user_name | agent_status | event_time | next_event_time
-----------+--------------+---------------------+---------------------
user1 | ready | 2017-01-01 10:00:00 | 2017-01-01 10:02:00
user1 | talking | 2017-01-01 10:02:00 | 2017-01-01 10:07:00
user2 | ready | 2017-01-01 10:04:00 | 2017-01-01 10:05:00
user2 | talking | 2017-01-01 10:05:00 | 2017-01-01 10:18:00
user1 | after_call | 2017-01-01 10:07:00 | 2017-01-01 10:08:00
user1 | ready | 2017-01-01 10:08:00 | 2017-01-01 10:10:00
user1 | talking | 2017-01-01 10:10:00 | 2017-01-01 10:15:00
user1 | after_call | 2017-01-01 10:15:00 | 2017-01-01 10:20:00
user2 | after_call | 2017-01-01 10:18:00 | 2017-01-01 10:21:00
user1 | paused | 2017-01-01 10:20:00 |
user2 | paused | 2017-01-01 10:21:00 |
==========
and now we just add one more column which does the subtraction to
calculate the duration:
==========
select user_name, agent_status, event_time,
lead(event_time) over (partition by user_name order by event_time)
next_event_time,
(lead(event_time) over (partition by user_name order by event_time))
- event_time as duration
from cc_events order by event_time;
user_name | agent_status | event_time | next_event_time | duration
-----------+--------------+---------------------+---------------------+----------
user1 | ready | 2017-01-01 10:00:00 | 2017-01-01 10:02:00 | 00:02:00
user1 | talking | 2017-01-01 10:02:00 | 2017-01-01 10:07:00 | 00:05:00
user2 | ready | 2017-01-01 10:04:00 | 2017-01-01 10:05:00 | 00:01:00
user2 | talking | 2017-01-01 10:05:00 | 2017-01-01 10:18:00 | 00:13:00
user1 | after_call | 2017-01-01 10:07:00 | 2017-01-01 10:08:00 | 00:01:00
user1 | ready | 2017-01-01 10:08:00 | 2017-01-01 10:10:00 | 00:02:00
user1 | talking | 2017-01-01 10:10:00 | 2017-01-01 10:15:00 | 00:05:00
user1 | after_call | 2017-01-01 10:15:00 | 2017-01-01 10:20:00 | 00:05:00
user2 | after_call | 2017-01-01 10:18:00 | 2017-01-01 10:21:00 | 00:03:00
user1 | paused | 2017-01-01 10:20:00 | |
user2 | paused | 2017-01-01 10:21:00 | |
==========
it might also be convenient to wrap the window function in a common
table expression
https://www.postgresql.org/docs/9.6/static/queries-with.html
==========
with calculate_next_events as (
select user_name, agent_status, event_time,
lead(event_time) over (partition by user_name order by event_time)
next_event_time
from cc_events order by event_time
)
select user_name, agent_status, next_event_time-event_time duration
from calculate_next_events order by event_time;
==========
Finally, if you really want to supercharge this and power-up even
more, besides temporal databases you might check out this recent blog
post about implementing a state machine in postgresql... it's really
interesting and closely related to what you're solving.
https://felixge.de/2017/07/27/implementing-state-machines-in-postgresql.html
Hope this is helpful. Great to see that you're working on PostgreSQL -
it's a powerful engine to build with!
-Jeremy
From | Date | Subject | |
---|---|---|---|
Next Message | Sandeep Gupta | 2017-10-04 13:25:51 | error: initdb: could not look up effective user ID 21073: user does not exist |
Previous Message | milist ujang | 2017-10-03 16:21:56 | Re: BDR, wal sender, high system cpu, mutex_lock_common |