From: | Achilleus Mantzios <achill(at)matrix(dot)gatewaynet(dot)com> |
---|---|
To: | Markus Schaber <schabi(at)logix-tt(dot)com> |
Cc: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: Ugly group by problem |
Date: | 2006-03-29 15:42:03 |
Message-ID: | Pine.LNX.4.44.0603291838520.5673-100000@matrix.gatewaynet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
O Markus Schaber έγραψε στις Mar 29, 2006 :
> Hello,
>
> I have a table filled from third party that basically has the following
> structure:
>
> link_id | feat_id | other | columns...
> --------+---------+-------+-----------
> 1 | 2 | ...
> 2 | 5 | ...
> 2 | 23 | ...
> 3 | 5 | ...
> 3 | 23 | some | data
> 3 | 23 | other | data
> 5 | 23 | ...
> 9 | 23 | ...
>
> This structure is fixed, and we can't change it, but we can create
>
> We have about 37 million different link_ids, and 35 million feat_ids.
> There are feat_ids that appear at several thousand link_ids, but a
> link_id does not have more than a douzen feat_ids.
>
> Now I need to group together all link_ids that have the same set of
> feat_ids. In the example above, the sets would be (1), (2,3) and (5,9),
> and the other columns would be run through some aggregate functions.
>
> Currently, this is done via an external JAVA application, but I'm
> looking for a way to express this via sql / plpgsql to ease deployment.
>
> I could imagine some ugly code using ARRAY (not tried yet), but how
> would you pack this problem? It seems that I'm just stuck in my thoughts
> and miss the beauty way to solve it.
Well i did it with arrays, i dont know the performance
implications tho:
foodb=# SELECT * from markustest ;
link_id | feat_id | other
---------+---------+-------
1 | 2 | 1
2 | 5 | 2
2 | 23 | 2
3 | 5 | 3
3 | 23 | 3
3 | 23 | 3
5 | 23 | 5
9 | 23 | 9
(8 rows)
foodb=# SELECT qoo.foo2,sum(qoo.foo3) from (SELECT mt.link_id as
foo,_int_union(array(select mt2.feat_id from markustest mt2 where
mt2.link_id=mt.link_id order by mt2.feat_id),'{}') as foo2,other::int4 as
foo3 from markustest mt) as qoo GROUP BY qoo.foo2;
foo2 | sum
--------+-----
{2} | 1
{5,23} | 13
{23} | 14
(3 rows)
foodb=#
The _int_union trick is to force the arrays to have unique values.
The order by has the meaning that '{5,23}' and '{23,5}' should be treated
the same way.
>
> Thanks,
> Markus
>
--
-Achilleus
From | Date | Subject | |
---|---|---|---|
Next Message | Scott Marlowe | 2006-03-29 16:50:03 | Re: [SQL] Flight numbers data |
Previous Message | Markus Schaber | 2006-03-29 14:44:10 | Re: Ugly group by problem |