Re: Statistics query

From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: chris(at)chriscurvey(dot)com
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Statistics query
Date: 2013-04-10 17:51:52
Message-ID: 5165A6B8.1000500@pinpointresearch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 04/10/2013 10:31 AM, Chris Curvey wrote:
>
> On Wed, Apr 10, 2013 at 12:30 PM, Steve Crawford
> <scrawford(at)pinpointresearch(dot)com
> <mailto:scrawford(at)pinpointresearch(dot)com>> wrote:
>
> I'm seeking ideas on the best way to craft the following query.
> I've stripped everything down to the bare essentials and
> simplified it below.
>
> Input data has a timestamp (actually an int received from the
> system in the form of a Unix epoch), a unit identifier and a status:
>
> event_time | unit_id | status
> ------------+---------+--------
> 1357056011 | 60 | 1
> 1357056012 | 178 | 0
> 1357056019 | 168 | 0
> 1357056021 | 3 | 0
> 1357056021 | 4 | 1
> 1357056021 | 179 | 0
> 1357056022 | 0 | 1
> 1357056022 | 1 | 0
> 1357056023 | 2 | 0
> 1357056024 | 9 | 0
> 1357056025 | 5 | 0
> 1357056025 | 6 | 0
> 1357056026 | 7 | 1
> ...
>
> A given unit_id cannot have two events at the same time (enforced
> by constraints).
>
> Given a point in time I would like to:
>
> 1. Identify all distinct unit_ids with an entry that exists in the
> preceding hour then
>
> 2. Count both the total events and sum the status=1 events for the
> most recent 50 events for each unit_id that fall within a limited
> period (e.g. don't look at data earlier than midnight). So unit_id
> 60 might have 50 events in the last 15 minutes while unit_id 4
> might have only 12 events after midnight.
>
> The output would look something like:
>
> unit_id | events | status_1_count
> ---------+--------+----------------
> 1 | 50 | 34
> 2 | 27 | 18
> 1 | 50 | 34
> 1 | 2 | 0
> ...
>
> Each sub-portion is easy and while I could use external processing
> or set-returning functions I was hoping first to find the
> secret-sauce to glue everything together into a single query.
>
> Cheers,
> Steve
>
>
> something like
>
> select unit_id, count(*), sum(status)
> from mytable a
> where event_time >= [whatever unix epoch translates to "last midnight"]
> and exists
> ( select *
> from mytable b
> where b.unit_id = a.unit_id
> and b.epoch >= [unix epoch that translates to "one hour ago"])
> group by unit _id;
>
> 1) I think figuring out the unix epoch should be reasonable...but I
> don't know how to do it off the top of my head.
> 2) I could completely be misunderstanding this. I'm not sure why the
> example results would have unit id 1 repeated. (which my suggestion
> WON'T do)

Because I screwed up cutting and pasting to make an example. The unit_id
in the output should, in fact, be distinct:

unit_id | events | status_1_count
---------+--------+----------------
1 | 50 | 34
2 | 27 | 18
3 | 50 | 34
4 | 2 | 0

You are correct, epoch is easy:
abstime(epoch)
or
extract(epoch from timestamptz)
depending on which direction you are going or for an hour difference
just subtract 3600.

The solution, however, misses the important complicating gotcha. The
units I want listed are only those that have had at least one event in
the last hour. But for each such unit, I only want the statistics to
reflect the most-recent 50 events (even if those events occurred earlier
than the current hour) provided the event occurred on the current date.
So the events column can never be less than 1 nor more than 50.

For example...

One unit might have a single event at the start of the last hour but 49
more in the preceding 10 minutes. I want to see that unit and the stats
for those 50 events.

Same thing if a unit has 50 events clustered at the end of an hour - I
don't want the earlier ones.

Another might have 50 events early in the day but none this hour. I
don't want to see that one.

But I do want to see the one that had an event in the last hour late in
the day along with the 48 other events that have accumulated since midnight.

Cheers,
Steve

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John R Pierce 2013-04-10 19:28:18 Re: How to convert US date format to European date format ?
Previous Message Chris Curvey 2013-04-10 17:31:49 Re: Statistics query