From: | lists-pgsql(at)useunix(dot)net |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: selecting records X minutes apart |
Date: | 2011-06-05 19:42:52 |
Message-ID: | 20110605194252.GD16254@slacker.ja10629.home |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Thank you all who replied!! It looks like Sugawara's recursive solution does
the trick. Unfortunately performance is quite poor for the sample dataset I'm
working with which is a table of about 50000 records. Indeed, there are
indexes applied to the table. I believe the recursive select is being
executed a great number of times causing the first part of the query to take a
long time.
The fastest solution I've come up with is a plpgsql procedure the loops over a
select where the result is ordered by (id,tstamp) and examines the tstamp
values and only returns rows that meet the interval criteria. This technique
takes roughly 2 seconds to filter out records over my 50000 record sample
set.... which is acceptable but not nearly as elegant as a single SQL
statement.
Again, thank you for all the replies.
Wayne
On Sun, Jun 05, 2011 at 08:52:30PM +0900, Masaru Sugawara wrote:
> On Fri, 3 Jun 2011 15:52:53 -0400
> lists-pgsql(at)useunix(dot)net wrote:
>
>
> I also think you might want to use WITH RECURSIVE clause.
> This SQL searches the case of an interval of 5 minutes or more,
> and sets a relationship between a parent to its child.
>
>
> CREATE TABLE tbl(id integer, ts time) ;
> INSERT INTO tbl VALUES
> (0, '20:00'),
> (0, '20:05'),
> (0, '20:08'),
> (0, '20:10'),
> (0, '20:11'),
> (1, '20:03'),
> (1, '20:04'),
> (1, '20:05'),
> (1, '20:09'),
> (1, '20:16');
>
> SELECT * FROM tbl;
>
>
> --
> WITH RECURSIVE rec(id , ts_p, ts_c) AS (
> SELECT a1.id, min(a1.ts), min(b1.ts)
> FROM tbl AS a1, tbl AS b1
> WHERE a1.id=b1.id AND a1.ts + interval'5 minute' <= b1.ts
> GROUP BY a1.id
> UNION ALL
> SELECT t2.id, t2.ts_p, t2.ts_c
> FROM rec AS t1 INNER JOIN
> (SELECT a2.id, a2.ts as ts_p, min(b2.ts) AS ts_c
> FROM tbl AS a2, tbl AS b2
> WHERE a2.id = b2.id AND a2.ts + interval'5 minute' <= b2.ts
> GROUP BY a2.id, a2.ts
> UNION ALL
> SELECT a3.id, a3.ts, null
> FROM tbl AS a3
> ) AS t2 ON t1.id = t2.id AND t1.ts_c=t2.ts_p
> )
> SELECT DISTINCT id, ts_p AS ts FROM rec
> ORDER BY 1,2;
>
>
>
>
>
>
> > I have a table that, at a minimum, has ID and timestamp columns. Records
> > are inserted into with random IDs and timestamps. Duplicate IDs are allowed.
> >
> > I want to select records grouped by ID, ordered by timestamp that are X minutes
> > apart. In this case X is 5.
> >
> > Note, the intervals are not X minute wall clock intervals, they are X minute
> > intervals from the last accepted record, per-id.
> >
> > For instance here is some sample input data:
> >
> > ID TS (HH:MM)
> > -------------------
> > 0 20:00
> > 1 20:03
> > 1 20:04
> > 0 20:05
> > 1 20:05
> > 0 20:08
> > 1 20:09
> > 0 20:10
> >
> > I'd want the select to return:
> >
> > ID TS (HH:MM)
> > -------------------
> > 0 20:00
> > 0 20:05
> > 0 20:10
> > 1 20:03
> > 1 20:09
> >
> >
> > Does my question make sense?
> >
> > Thanks in advance,
> > Wayne
> >
> > --
> > Sent via pgsql-sql mailing list (pgsql-sql(at)postgresql(dot)org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-sql
>
>
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
From | Date | Subject | |
---|---|---|---|
Next Message | manuel antonio ochoa | 2011-06-06 23:16:05 | To find process that lock a table |
Previous Message | Masaru Sugawara | 2011-06-05 12:06:25 | Re: selecting records X minutes apart |