count number of concurrent requests

From: Ondrej Ivanič <ondrej(dot)ivanic(at)gmail(dot)com>
To: "pgsql-general(at)postgresql(dot)org general" <pgsql-general(at)postgresql(dot)org>
Subject: count number of concurrent requests
Date: 2012-08-18 13:11:51
Message-ID: CAM6mieLP8GcGNCsZOFPrA2SEiZFNikcwC62DC1gA2T+Om5Kgvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I have the following table:

dwh=> \d events
Table "public.events"
Column | Type | Modifiers
------------------+-----------------------------+-----------
datetime | timestamp without time zone |
request_duration | integer |

dwh=> select count(*) from events;
count
----------
82912116
(1 row)

Each row represents one event which started at 'datetime' and finished
at 'datetime + request_duration'. I would like to know how many other
events started between 'datetime' and ''datetime + request_duration'
(ie. concurrency).

I've started with this query (CTE + join):
with e as (
select
datetime as date_s,
datetime + (request_duration::text || ' msec')::interval as date_e,
request_duration
from events
where datetime < '2012-08-01 00:01:00'
)
select
e1.date_s,
e1.date_e,
count(*) as "count",
count(case when e1.request_duration > 1000 then true else null
end) as "over 1000"
from e as e1
left join e as e2 on (e2.date_s between e1.date_s and e1.date_e)
group by e1.date_s, e1.date_e
having count(case when e1.request_duration > 1000 then true else null end) > 0

which is incredibly slow (as expected) and I can not analyse more than
several minutes of real traffic. I need to run this query over few
days at least. Second try was this one:

select
date_s,
date_e,
counts[1] as "count",
counts[2] as "over 1000"
from (
select
datetime as date_s,
datetime + (request_duration::text || ' msec')::interval as date_e,
(
select
array[
count(*),
count(case when ee.request_duration > 1000 then
true else null end)
]
from events ee
where ee.datetime < '2012-08-01 00:01:00'
and ee.datetime >= e.datetime
and ee.datetime <= e.datetime +
(e.request_duration::text || ' msec')::interval
) as counts
from events as e
where datetime < '2012-08-01 00:01:00'
) as x
where counts[2] > 0

Which is much better (like few minutes per day) but I can not use any
of those queries because:
- events table is partitioned so the first one is not "going to work"
at all and second could be fixed by adding date ranges to "select ...
from events ee" query
- and I would like to have one query across two database platform but
other one doesn't support "correlated subquery"

Hence I've written simple PHP script which looks like "Window
Function". I tried to rewrite query using window function(s) but I
can't get my head around it.

Any ideas? I'm looking for something which is relatively fast and
doesn't use "correlated subquery"...

Thanks!

--
Ondrej Ivanic
(ondrej(dot)ivanic(at)gmail(dot)com)

Browse pgsql-general by date

  From Date Subject
Next Message Wolfgang Keller 2012-08-18 14:24:32 Re: Messy data models (Re: Visualize database schema)
Previous Message Chris Travers 2012-08-18 11:01:06 Re: Schemas vs partitioning vs multiple databases for archiving