| From: | Steve Crawford <scrawford(at)pinpointresearch(dot)com> |
|---|---|
| To: | chris(at)chriscurvey(dot)com |
| Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
| Subject: | Re: Statistics query |
| Date: | 2013-04-10 17:51:52 |
| Message-ID: | 5165A6B8.1000500@pinpointresearch.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
On 04/10/2013 10:31 AM, Chris Curvey wrote:
>
> On Wed, Apr 10, 2013 at 12:30 PM, Steve Crawford
> <scrawford(at)pinpointresearch(dot)com
> <mailto:scrawford(at)pinpointresearch(dot)com>> wrote:
>
> I'm seeking ideas on the best way to craft the following query.
> I've stripped everything down to the bare essentials and
> simplified it below.
>
> Input data has a timestamp (actually an int received from the
> system in the form of a Unix epoch), a unit identifier and a status:
>
> event_time | unit_id | status
> ------------+---------+--------
> 1357056011 | 60 | 1
> 1357056012 | 178 | 0
> 1357056019 | 168 | 0
> 1357056021 | 3 | 0
> 1357056021 | 4 | 1
> 1357056021 | 179 | 0
> 1357056022 | 0 | 1
> 1357056022 | 1 | 0
> 1357056023 | 2 | 0
> 1357056024 | 9 | 0
> 1357056025 | 5 | 0
> 1357056025 | 6 | 0
> 1357056026 | 7 | 1
> ...
>
> A given unit_id cannot have two events at the same time (enforced
> by constraints).
>
> Given a point in time I would like to:
>
> 1. Identify all distinct unit_ids with an entry that exists in the
> preceding hour then
>
> 2. Count both the total events and sum the status=1 events for the
> most recent 50 events for each unit_id that fall within a limited
> period (e.g. don't look at data earlier than midnight). So unit_id
> 60 might have 50 events in the last 15 minutes while unit_id 4
> might have only 12 events after midnight.
>
> The output would look something like:
>
> unit_id | events | status_1_count
> ---------+--------+----------------
> 1 | 50 | 34
> 2 | 27 | 18
> 1 | 50 | 34
> 1 | 2 | 0
> ...
>
> Each sub-portion is easy and while I could use external processing
> or set-returning functions I was hoping first to find the
> secret-sauce to glue everything together into a single query.
>
> Cheers,
> Steve
>
>
> something like
>
> select unit_id, count(*), sum(status)
> from mytable a
> where event_time >= [whatever unix epoch translates to "last midnight"]
> and exists
> ( select *
> from mytable b
> where b.unit_id = a.unit_id
> and b.epoch >= [unix epoch that translates to "one hour ago"])
> group by unit _id;
>
> 1) I think figuring out the unix epoch should be reasonable...but I
> don't know how to do it off the top of my head.
> 2) I could completely be misunderstanding this. I'm not sure why the
> example results would have unit id 1 repeated. (which my suggestion
> WON'T do)
Because I screwed up cutting and pasting to make an example. The unit_id
in the output should, in fact, be distinct:
unit_id | events | status_1_count
---------+--------+----------------
1 | 50 | 34
2 | 27 | 18
3 | 50 | 34
4 | 2 | 0
You are correct, epoch is easy:
abstime(epoch)
or
extract(epoch from timestamptz)
depending on which direction you are going or for an hour difference
just subtract 3600.
The solution, however, misses the important complicating gotcha. The
units I want listed are only those that have had at least one event in
the last hour. But for each such unit, I only want the statistics to
reflect the most-recent 50 events (even if those events occurred earlier
than the current hour) provided the event occurred on the current date.
So the events column can never be less than 1 nor more than 50.
For example...
One unit might have a single event at the start of the last hour but 49
more in the preceding 10 minutes. I want to see that unit and the stats
for those 50 events.
Same thing if a unit has 50 events clustered at the end of an hour - I
don't want the earlier ones.
Another might have 50 events early in the day but none this hour. I
don't want to see that one.
But I do want to see the one that had an event in the last hour late in
the day along with the 48 other events that have accumulated since midnight.
Cheers,
Steve
| From | Date | Subject | |
|---|---|---|---|
| Next Message | John R Pierce | 2013-04-10 19:28:18 | Re: How to convert US date format to European date format ? |
| Previous Message | Chris Curvey | 2013-04-10 17:31:49 | Re: Statistics query |