Quick Links

Re: Count dates distinct within an interval

From:	Stuart <deststar(at)blueyonder(dot)co(dot)uk>
To:	Dmitry Tkach <dmitry(at)openratings(dot)com>
Cc:	pgsql-sql <pgsql-sql(at)postgresql(dot)org>
Subject:	Re: Count dates distinct within an interval
Date:	2003-07-15 17:31:31
Message-ID:	3F143A73.5060204@blueyonder.co.uk
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

Dmitry Tkach wrote:
> Hi, everybody!
>
> I was trying to formulate a sql query, that I now think is impossible :-(
> I just thought, I'd run it by you before I give up - perhaps, you guys
> would have some ideas...
>
> Suppose, I have a table like this
>
> create table test
> (
> stuff int,
> stamp timestamp
> );
>
> Now, I want to count the occurences of each value of stuff in the table,
> but so that entries within 24 hours from each other count as one...
> The closest I could think of is:
>
> select stuff, count (distinct date_trunc ('day', stamp)) from test group
> by stuff;
>
> This doesn't do exactly what I need though - for example, if one entry
> is one minute before midnight, and the other one is two minutes later,
> they'd get counted as 2, and what I am looking for is the way to get
> them collapsed into one as long as they are less then 24 hours apart...
>
> Now, I am pretty sure, it is simply impossible to do what I want with
> count (distinct...) because my 'equality' is not transitive - for
> example, three entries, like
> A = 2001 - 01- 01 20:20:00
> B = 2001 - 01 - 02 20:19:00
> C = 2001 - 01 - 02 20:21:00
>
> Should be counted as *two* (A === B, and B === C, but *not* A === C)...
>
> Also, I could certainly write a simple function, that would get all the
> entries in order, and scan through them, counting according to my rules...
> But I was hoping to find some way to do this in plain sql though...
>
> Any ideas?
>
> Thanks!
>
> Dima
>

You would probably be able to speed the following up using immutable
funtions to aid the query, or just a function to do it. However I think
this does what you asked in a query. I've put a script at the end.
hth,
- Stuart
-- s is the stuff to group by
-- dt is the datetime thing
create table Q (
s int4,
dt timestamptz);

truncate Q;

INSERT INTO Q (s,dt) VALUES (1,'2003/01/01'::timestamptz);
INSERT INTO Q (s,dt) VALUES (1,'2003/01/02 04:00'::timestamptz);
INSERT INTO Q (s,dt) VALUES (1,'2003/01/01 08:00'::timestamptz);
INSERT INTO Q (s,dt) VALUES (1,'2003/01/02'::timestamptz);
INSERT INTO Q (s,dt) VALUES (1,'2003/01/05 23:00'::timestamptz);
INSERT INTO Q (s,dt) VALUES (1,'2003/01/06 22:00'::timestamptz);
INSERT INTO Q (s,dt) VALUES (1,'2003/01/07 05:00'::timestamptz);
INSERT INTO Q (s,dt) VALUES (3,'2003/01/01'::timestamptz);
INSERT INTO Q (s,dt) VALUES (4,'2003/01/01'::timestamptz);
INSERT INTO Q (s,dt) VALUES (4,'2003/01/02 05:00'::timestamptz);
INSERT INTO Q (s,dt) VALUES (4,'2003/01/03 04:00'::timestamptz);

SELECT COUNT(*),s FROM Q WHERE dt IN (SELECT min(R.dt) FROM Q AS R WHERE
Q.s=R.s)
OR dt IN (SELECT min(P.dt) FROM Q AS P WHERE P.s=Q.s AND
P.dt>=date_trunc('day',Q.dt)-
CASE WHEN
(SELECT max(dt)::time FROM Q AS T WHERE Q.s=T.s AND T.dt<Q.dt AND NOT
EXISTS
(SELECT * FROM Q AS U WHERE T.dt-'1 day'::interval<U.dt AND T.dt>U.dt
AND U.s=Q.s))
>Q.dt::time THEN '1 day'::interval
ELSE
'0 day'::interval
END
+(SELECT max(dt)::time FROM Q AS T WHERE Q.s=T.s AND T.dt<Q.dt AND NOT
EXISTS
(SELECT * FROM Q AS U WHERE T.dt-'1 day'::interval<U.dt AND T.dt>U.dt
AND U.s=Q.s)))
GROUP BY s;

In response to

Count dates distinct within an interval at 2003-07-15 14:33:47 from Dmitry Tkach

Browse pgsql-sql by date

	From	Date	Subject
Next Message	David Olbersen	2003-07-15 17:36:42	Functional Indexes
Previous Message	Scott Cain	2003-07-15 16:47:32	Re: Cannot insert dup id in pk