From: | dave <audiotecture(at)web(dot)de> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: Recursive merging of overlapping arrays in a column |
Date: | 2015-09-20 15:57:17 |
Message-ID: | 1442764637710-5866579.post@n5.nabble.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Sorry, here is the post again in plain text...
i have the following Table:
CREATE TABLE arrays (id SERIAL, arr INT[]);
INSERT INTO arrays (arr) VALUES (ARRAY[1,3,6,9]);
INSERT INTO arrays (arr) VALUES (ARRAY[2,4]);
INSERT INTO arrays (arr) VALUES (ARRAY[3,10,40]);
INSERT INTO arrays (arr) VALUES (ARRAY[3,18,44]);
INSERT INTO arrays (arr) VALUES (ARRAY[63,140,420]);
INSERT INTO arrays (arr) VALUES (ARRAY[42,102,420]);
INSERT INTO arrays (arr) VALUES (ARRAY[2,7]);
INSERT INTO arrays (arr) VALUES (ARRAY[1,3,11]);
INSERT INTO arrays (arr) VALUES (ARRAY[8,12,19]);
I want to merge the arrays which have overlapping elements, so that I get
the result which doesn't contain overlapping arrays anymore:
arr
--------------------------
{1,3,6,9,10,11,18,40,44}
{2,4,7}
{8,12,19}
{42,63,102,140,420}
I am not an expert in SQL and it took me a long time to come up with this
solution:
WITH RECURSIVE clusters AS (
select DISTINCT uniq(sort_asc(array_cat(a1.arr, a2.arr))) AS arr
from arrays a1 cross join arrays a2
where a1.arr && a2.arr AND
least(a1.id,a2.id) != greatest(a1.id, a2.id)
UNION
select DISTINCT uniq(sort_asc(array_cat(a1.arr, a2.arr))) AS arr
from arrays a1 cross join clusters a2
where a1.arr && a2.arr AND
a1.arr != a2.arr
)
SELECT arr FROM (
SELECT * FROM (
SELECT DISTINCT ON (arr[1]) arr
FROM clusters
ORDER BY arr[1], array_length(arr, 1) DESC
) AS c
UNION
SELECT arr FROM arrays WHERE id NOT IN (
SELECT DISTINCT a1.id
FROM arrays a1 CROSS JOIN arrays a2
WHERE a1.arr && a2.arr AND
least(a1.id,a2.id) != greatest(a1.id, a2.id)
)
) AS clustertable
ORDER BY arr;
Which gives me the result:
arr
--------------------------
{1,3,6,9,10,11,18,40,44}
{2,4,7}
{3,10,18,40,44}
{8,12,19}
{42,63,102,140,420}
(5 rows)
Result number 3 is contained in number one and shouldn't be in the output
anymore, because I only want non overlapping arrays in the result.
Another problem I encountered is that the performance of this query seems to
be very bad. I tried running it on a larger table (~400000 arrays) and it is
still running after ~10h.
I would appreciate any input on this problems, so it would be nice if anyone
could give me a hint how to get only the merged arrays without overlaps in
the resultset and maybe how to build a more elegant and efficient query.
Thanks in advance,
Dave
--
View this message in context: http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560p5866579.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
From | Date | Subject | |
---|---|---|---|
Next Message | David G. Johnston | 2015-09-20 17:43:20 | Re: Recursive merging of overlapping arrays in a column |
Previous Message | Rob Sargent | 2015-09-20 14:00:00 | Re: Recursive merging of overlapping arrays in a column |