Re: Recursive merging of overlapping arrays in a column

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: dave <audiotecture(at)web(dot)de>
Cc: "pgsql-sql(at)postgresql(dot)org" <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Recursive merging of overlapping arrays in a column
Date: 2015-09-20 17:43:20
Message-ID: CAKFQuwZaOBndHjiyD9-b2bE3y2Mbd5ttQajT4BQ48cXbZZwxrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Sunday, September 20, 2015, dave <audiotecture(at)web(dot)de> wrote:

> Sorry, here is the post again in plain text...
>
> i have the following Table:
>
> CREATE TABLE arrays (id SERIAL, arr INT[]);
> INSERT INTO arrays (arr) VALUES (ARRAY[1,3,6,9]);
> INSERT INTO arrays (arr) VALUES (ARRAY[2,4]);
> INSERT INTO arrays (arr) VALUES (ARRAY[3,10,40]);
> INSERT INTO arrays (arr) VALUES (ARRAY[3,18,44]);
> INSERT INTO arrays (arr) VALUES (ARRAY[63,140,420]);
> INSERT INTO arrays (arr) VALUES (ARRAY[42,102,420]);
> INSERT INTO arrays (arr) VALUES (ARRAY[2,7]);
> INSERT INTO arrays (arr) VALUES (ARRAY[1,3,11]);
> INSERT INTO arrays (arr) VALUES (ARRAY[8,12,19]);
>
>
> I want to merge the arrays which have overlapping elements, so that I get
> the result which doesn't contain overlapping arrays anymore:
>
> arr
> --------------------------
> {1,3,6,9,10,11,18,40,44}
> {2,4,7}
> {8,12,19}
> {42,63,102,140,420}
>
>
> I am not an expert in SQL and it took me a long time to come up with this
> solution:
>
> WITH RECURSIVE clusters AS (
> select DISTINCT uniq(sort_asc(array_cat(a1.arr, a2.arr))) AS arr
> from arrays a1 cross join arrays a2
> where a1.arr && a2.arr AND
> least(a1.id,a2.id) != greatest(a1.id, a2.id)
>
> UNION
>
> select DISTINCT uniq(sort_asc(array_cat(a1.arr, a2.arr))) AS arr
> from arrays a1 cross join clusters a2
> where a1.arr && a2.arr AND
> a1.arr != a2.arr
> )
>
> SELECT arr FROM (
> SELECT * FROM (
> SELECT DISTINCT ON (arr[1]) arr
> FROM clusters
> ORDER BY arr[1], array_length(arr, 1) DESC
> ) AS c
>
> UNION
>
> SELECT arr FROM arrays WHERE id NOT IN (
> SELECT DISTINCT a1.id
> FROM arrays a1 CROSS JOIN arrays a2
> WHERE a1.arr && a2.arr AND
> least(a1.id,a2.id) != greatest(a1.id, a2.id)
> )
> ) AS clustertable
> ORDER BY arr;
>
>
> Which gives me the result:
>
> arr
> --------------------------
> {1,3,6,9,10,11,18,40,44}
> {2,4,7}
> {3,10,18,40,44}
> {8,12,19}
> {42,63,102,140,420}
> (5 rows)
>
>
> Result number 3 is contained in number one and shouldn't be in the output
> anymore, because I only want non overlapping arrays in the result.
>
> Another problem I encountered is that the performance of this query seems
> to
> be very bad. I tried running it on a larger table (~400000 arrays) and it
> is
> still running after ~10h.
>
> I would appreciate any input on this problems, so it would be nice if
> anyone
> could give me a hint how to get only the merged arrays without overlaps in
> the resultset and maybe how to build a more elegant and efficient query.
>
>
> Thanks in advance,
>
> Dave
>
>
>
While I've never actually done this personally...

Convert the data into nodes and edges and import it into a graph database.

David J.

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message dave 2015-09-20 19:40:12 Re: Recursive merging of overlapping arrays in a column
Previous Message dave 2015-09-20 15:57:17 Re: Recursive merging of overlapping arrays in a column