Re: Need some help with a query (uniq -c)

From: Kenichiro Tanaka <ketanaka(at)ashisuto(dot)co(dot)jp>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Need some help with a query (uniq -c)
Date: 2010-04-13 11:12:40
Message-ID: 4BC451A8.3010900@ashisuto.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello.

I try with "With Query".
http://www.postgresql.org/docs/8.4/static/queries-with.html

#We can use "With Queries" > v8.4
#That'll only work if the time values are contiguous, but there's probably a
#similar trick for non-contiguous ,too.

--create data
drop table foo;
create table foo( time int,message text);
insert into foo values(1,'a');
insert into foo values(2,'b');
insert into foo values(3,'b');
insert into foo values(4,'c');
insert into foo values(5,'a');
insert into foo values(6,'c');
insert into foo values(7,'c');
insert into foo values(8,'a');
insert into foo values(9,'a');
insert into foo values(10,'a');

--begin Answer
with recursive r as (
select foo.time,foo.message,1 as dummy from foo
union all
select foo.time,foo.message,r.dummy+1 from foo , r
where foo.time=r.time-1 and foo.message=r.message
)
,rr as (
select foo.time,foo.message,'OLID' as flag from foo
union all
select foo.time,foo.message,'DUP' as flag from foo , rr
where foo.time-1=rr.time-2 and foo.message=rr.message
)
select time min,time+max(dummy)-1 max,message,max(dummy) counts
from r where time not in (select distinct (time+1) times from rr
where flag='DUP') group by time,message order by time;

--result
postgres(# where flag='DUP') group by time,message order by time;
min | max | message | counts
-----+-----+---------+--------
1 | 1 | a | 1
2 | 3 | b | 2
4 | 4 | c | 1
5 | 5 | a | 1
6 | 7 | c | 2
8 | 10 | a | 3
(6 rows)

--end

But I think some one can provide more simple SQL.

Thank you.
> On Apr 12, 2010, at 11:31 AM, Scott Marlowe wrote:
>
>
>> On Mon, Apr 12, 2010 at 12:22 PM, A B<gentosaker(at)gmail(dot)com> wrote:
>>
>>> Hello!
>>>
>>> I have a table (think of it as a table of log messages)
>>>
>>> time | message
>>> -----------------------
>>> 1 | a
>>> 2 | b
>>> 3 | b
>>> 4 | b
>>> 5 | a
>>>
>>> the three 'b' are the same message, so I would like to write a query
>>> that would give me a result that is similar to what the unix command
>>> "uniq -c" would give:
>>>
>>> first | message | last | count
>>> --------------------------------------
>>> 1 | a | 1 | 1
>>> 2 | b | 4 | 3<--- here it squeezes
>>> similar consecutive messages into a single row
>>> 5 | a | 5 | 1
>>>
>>> How do I write such a command?
>>>
>> Pretty straight ahead:
>>
>> select min(t), message, max(t), count(*) from table group by message.
>>
> That was my first though too, but it combines everything not just adjacent messages.
>
> Something like this, maybe
>
> select t1.message, t1.time as first, t2.time as last, t2.time-t1.time+1 as count
> from foo as t1, foo as t2
> where t1.time<= t2.time and t1.message = t2.message
> and not exists
> (select * from foo as t3
> where (t3.time between t1.time and t2.time and t3.message<> t1.message)
> or (t3.time = t2.time + 1 and t3.message = t1.message)
> or (t3.time = t1.time - 1 and t3.message = t1.message));
>
> message | first | last | count
> ---------+-------+------+-------
> a | 1 | 1 | 1
> b | 2 | 4 | 3
> a | 5 | 5 | 1
>
> That'll only work if the time values are contiguous, but there's probably a
> similar trick for non-contiguous.
>
> Cheers,
> Steve
>
>
>

--
================================================
Kenichiro Tanaka
K.K.Ashisuto
http://www.ashisuto.co.jp/english/index.html
================================================

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andrus 2010-04-13 11:49:59 Email address column verification for address list
Previous Message tv 2010-04-13 11:05:40 Re: Dynamic Catalog Views