Re: time series data

From: Melvin Davidson <melvin6925(at)gmail(dot)com>
To: Clifford Snow <clifford(at)snowandsnow(dot)us>
Cc: Khalil Khamlichi <khamlichi(dot)khalil(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: time series data
Date: 2017-10-02 14:06:32
Message-ID: CANu8FiyxwQ6S0T-2dWhp0GksvJ85Ec9yqfpoeZ8Yb9LjaMs45A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sun, Oct 1, 2017 at 6:20 PM, Clifford Snow <clifford(at)snowandsnow(dot)us>
wrote:

> I have a stream that updates every minute with a trigger that updates
> another table with information from the stream. That way I'm constantly
> updated with no need to run a script to update before I want a report.
>
> Clifford
>
> On Sun, Oct 1, 2017 at 10:08 AM, Melvin Davidson <melvin6925(at)gmail(dot)com>
> wrote:
>
>>
>>
>> On Sun, Oct 1, 2017 at 4:17 AM, Khalil Khamlichi <
>> khamlichi(dot)khalil(at)gmail(dot)com> wrote:
>>
>>> Hi everyone,
>>>
>>> I have a data stream of a call center application coming in to postgres
>>> in this format :
>>>
>>> user_name, user_status, event_time
>>>
>>> 'user1', 'ready', '2017-01-01 10:00:00'
>>> 'user1', 'talking', '2017-01-01 10:02:00'
>>> 'user1', 'after_call', '2017-01-01 10:07:00'
>>> 'user1', 'ready', '2017-01-01 10:08:00'
>>> 'user1', 'talking', '2017-01-01 10:10:00'
>>> 'user1', 'after_call', '2017-01-01 10:15:00'
>>> 'user1', 'paused', '2017-01-01 10:20:00'
>>> ...
>>> ...
>>>
>>> so as you see each new insert of an "event" is in fact the start_time of
>>> that event and also the end_time of the previous one so should be used to
>>> calculate the duration of this previous one.
>>>
>>> What is the best way to get user_status statistics like total duration,
>>> frequency, avg ...etc , does any body have an experience with this sort of
>>> data streams ?
>>>
>>>
>>> Thanks in advance.
>>>
>>
>> Just a suggestion, but here is what I would do.
>> First, create your tables similar to as follows
>>
>> CREATE TABLE status
>> (
>> call_status varchar(10) NOT NULL,
>> CONSTRAINT status_pk PRIMARY KEY (call_status)
>> );
>>
>> INSERT INTO status
>> (call_status)
>> VALUES
>> ('ready'),
>> ('talking'),
>> ('after_call');
>>
>> CREATE TABLE user_sessions
>> (
>> username name NOT NULL,
>> session_id bigint NOT NULL,
>> call_status varchar(10) NOT NULL,
>> call_time timestamp NOT NULL,
>> CONSTRAINT user_sessions_pk PRIMARY KEY (username,
>> session_id,call_status),
>> CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
>> REFERENCES status(call_status)
>> );
>>
>> Next, you will need to generate a unique session_id for each
>> user, but only for when call_status is 'ready'. So probably
>> a table of the form:
>>
>> CREATE TABLE current_session
>> (
>> username name NOT NULL,
>> session_id serial NOT NULL,
>> CONSTRAINT current_session_pk PRIMARY KEY (username)
>> );
>>
>> Then all you need to do is:
>> 1. Update current_session and get the new session_id each time a user
>> connects (call_status = 'ready'.
>> Probably best to use a BEFORE trigger to do this, but you will need to
>> code it yourself.
>>
>> 2. You can then do
>>
>> SELECT username,
>> age ( (SELECT call_time FROM current_session WHERE call_status =
>> 'talking'),
>> ( SELECT call_time FROM current_session WHERE call_status =
>> 'after_call')
>> ) as duration
>> FROM user_sessions
>> WHERE username = '*actual_user_name*'
>> AND session_id = *actual_session_id*;
>>
>> You can use similar queries for avg and frequency.
>>
>> --
>> *Melvin Davidson*
>> I reserve the right to fantasize. Whether or not you
>> wish to share my fantasy is entirely up to you.
>>
>
>
>
> --
> @osm_seattle
> osm_seattle.snowandsnow.us
> OpenStreetMap: Maps with a human touch
>

I thought about the table design for user_sessions and came up
with a better one:

CREATE TABLE user_sessions
(
username name NOT NULL,
session_id bigint NOT NULL,
call_status varchar(10) NOT NULL,
call_ready timestamp NOT NULL,
call_talking timestamp,
call_after_call timestamp,
call_duration interval,
CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id),
CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
REFERENCES status(call_status)
);

So in essence, when the call starts, just do:

INSERT INTO user_sessions
(username, call_ready)
VALUES
('actual_user_name', now() );

Then
SELECT max(session_id) AS current_session
FROM user_sessions
WHERE username = 'actual_user_name';

When talking starts:
UPDATE user_sessions
SET call_status = 'talking',
call_talking = now()
WHERE username = 'actual_user_name'
AND session_id = current_session;

When call ends:
UPDATE user_sessions
SET call_status = 'after_call',
call_after_call = now()
WHERE username = 'actual_user_name'
AND session_id = current_session;

Now all you have to do to get call length is:

SELECT username,
age ( call_after_call, call_talking ) as duration
FROM user_sessions
WHERE username = 'actual_user_name'
AND session_id = current_session;

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2017-10-02 14:06:45 Re: Setting search_path ignored
Previous Message Guyren Howe 2017-10-02 14:00:45 Setting search_path ignored