Re: Oracle Analytical Functions

From: Willem Buitendyk <willem(at)pcfish(dot)ca>
To: Adam Rich <adam(dot)r(at)sbcglobal(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Oracle Analytical Functions
Date: 2008-01-31 20:48:06
Message-ID: 47A23406.2000803@pcfish.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Here is a little test example. It seems that the second order by
condition is not working - in this case datetime.

create table arb_test (
client_id integer,
arbnum integer);

insert into arb_test values (2,1);
insert into arb_test values (2,33);
insert into arb_test values (2,6);
insert into arb_test values (2,76);
insert into arb_test values (2,111);
insert into arb_test values (2,10);
insert into arb_test values (2,55);
insert into arb_test values (7,12);
insert into arb_test values (7,6);
insert into arb_test values (7,144);
insert into arb_test values (7,63);
insert into arb_test values (7,87);
insert into arb_test values (7,24);
insert into arb_test values (7,22);
insert into arb_test values (1,14);
insert into arb_test values (1,23);
insert into arb_test values (1,67);
insert into arb_test values (1,90);
insert into arb_test values (1,2);
insert into arb_test values (1,5);
insert into arb_test values (5,8);
insert into arb_test values (5,42);
insert into arb_test values (5,77);
insert into arb_test values (5,9);
insert into arb_test values (5,89);
insert into arb_test values (5,23);
insert into arb_test values (5,11);

DROP SEQUENCE if exists seq1;
DROP SEQUENCE if exists seq2;
CREATE TEMPORARY SEQUENCE seq1 CACHE 1000;
CREATE TEMPORARY SEQUENCE seq2 CACHE 1000;

select a.client_id, b.arbnum, a.arbnum as previousarbnum, (b.arbnum -
a.arbnum) as diffarbnum from
(select nextval('seq1') as s, client_id, arbnum from arb_test
order by client_id, arbnum OFFSET 0) as a
inner join
(select nextval('seq2') as s, client_id, arbnum from arb_test
order by client_id, arbnum OFFSET 0) as b
on a.s=(b.s-1) where a.client_id=b.client_id;

--create or replace view arb_view as select * from arb_test order by
client_id, arbnum;

Here are the results:

client_id | arbnum | previousarbnum | diffarbnum
-----------+--------+----------------+------------
1 | 23 | 14 | 9
1 | 67 | 23 | 44
1 | 90 | 67 | 23
1 | 2 | 90 | -88
1 | 5 | 2 | 3
2 | 33 | 1 | 32
2 | 6 | 33 | -27
2 | 76 | 6 | 70
2 | 111 | 76 | 35
2 | 10 | 111 | -101
2 | 55 | 10 | 45
5 | 42 | 8 | 34
5 | 77 | 42 | 35
5 | 9 | 77 | -68
5 | 89 | 9 | 80
5 | 23 | 89 | -66
5 | 11 | 23 | -12
7 | 6 | 12 | -6
7 | 144 | 6 | 138
7 | 63 | 144 | -81
7 | 87 | 63 | 24
7 | 24 | 87 | -63

When I used a sorted view:

create or replace view arb_view as select * from arb_test order by
client_id, arbnum;

and redid it the results are:

client_id | arbnum | previousarbnum | diffarbnum
-----------+--------+----------------+------------
1 | 5 | 2 | 3
1 | 14 | 5 | 9
1 | 23 | 14 | 9
1 | 67 | 23 | 44
1 | 90 | 67 | 23
2 | 6 | 1 | 5
2 | 10 | 6 | 4
2 | 33 | 10 | 23
2 | 55 | 33 | 22
2 | 76 | 55 | 21
2 | 111 | 76 | 35
5 | 9 | 8 | 1
5 | 11 | 9 | 2
5 | 23 | 11 | 12
5 | 42 | 23 | 19
5 | 77 | 42 | 35
5 | 89 | 77 | 12
7 | 12 | 6 | 6
7 | 22 | 12 | 10
7 | 24 | 22 | 2
7 | 63 | 24 | 39
7 | 87 | 63 | 24
7 | 144 | 87 | 57
(23 rows)

This works the way it should.

--drop table arb_test;
--drop view arb_view;

willem
> The 'all_client_times' table has 753698 rows. The lagfunc() on the
> sorted view returns 753576 rows and appears to work exactly as
> needed. Using the function on an unsorted table returns only 686 rows
> and is missing a whole lot of data. Running the count query returns
> 122 - which is correct as the amount of clients that I have. Each
> client has between 5 - 7K records each.
>
> The way I see it is for each client there will be one row, namely, the
> first in the series, that will not be included in the final results as
> it would not have a previous time. With that in mind, if I take my
> table row count as 753698 and minus the amount of clients I have,
> 122, then I should get the number of results as 753576 which is
> correct when I use your methods on a sorted table but which is not
> correct when I use your methods on an unsorted table.
>
> willem
>
> Adam Rich wrote:
>> Hi Willem,
>>
>>
>>> for some reason the order by's aren't working.
>>>
>>
>> Could you provide more details? Do you get a specific error message?
>>
>>
>>> only returning 658 rows instead of the 750K.
>>>
>>
>> You should not expect the same row count in both source table and
>> result set. Even in your example -- you provided 8 source rows, and
>> 4 result rows. You can determine the correct number of results via
>> "the number of records, related to client_ids having two or more
>> records in all_client_times, minus one". It may be true that you
>> have 750k
>> records but only 658 rows that satisfy this requirement.
>>
>> What do you get for this query?
>>
>> select count(*) from ( select client_id, count(*) as rows from
>> all_client_times group by client_id having count(*) > 1 ) as x
>>
>>
>> Adam
>>
>>
>>
>>
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org/
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adam Rich 2008-01-31 21:04:42 Re: Oracle Analytical Functions
Previous Message Alvaro Herrera 2008-01-31 20:39:25 Re: Log file permissions?