Re: Complex Recursive Query

From: matt(at)byrney(dot)com
To: "Jim Garrison" <jim(dot)garrison(at)nwea(dot)org>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Complex Recursive Query
Date: 2014-07-24 00:41:08
Message-ID: 06947bada247128a5e937d15084abeab.squirrel@mail.byrney.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I wouldn't do this with recursion; plain old iteration is your friend
(yes, WITH RECURSIVE is actually iterative, not recursive...)

The algorithm goes like this:

1. Extend your graph relation to be symmetric and transitive.
2. Assign a integer group id to each node.
3. Repeatedly join the node list to the (extended) relation, updating each
node's group id to be the minimum of the group ids of every node it
touches.
4. Stop when the group ids stop changing.

Here's some example code, using your data:

DROP SCHEMA IF EXISTS test CASCADE;
CREATE SCHEMA test;
SET SEARCH_PATH TO test;

CREATE TABLE graph(key1 TEXT, key2 TEXT);

INSERT INTO graph VALUES
('a', 'x'),
('a', 'y'),
('b', 'w'),
('c', 't'),
('x', 'a'),
('y', 'a'),
('y', 'z'),
('z', 'y'),
('t', 'c'),
('w', 'b'),
('w', 'd'),
('d', 'w');

DO
$$
DECLARE
prev INT = 0;
curr INT;
BEGIN
CREATE TABLE rel AS
SELECT key1, key2 FROM graph
UNION
SELECT key2, key1 FROM graph
UNION
SELECT key1, key1 FROM graph
UNION
SELECT key2, key2 FROM graph;

CREATE TABLE group_ids AS
SELECT
key,
ROW_NUMBER() OVER (ORDER BY key) AS group_id
FROM
(
SELECT key1 AS key FROM graph
UNION
SELECT key2 FROM graph
) _;

SELECT SUM(group_id) INTO curr FROM group_ids;
WHILE prev != curr LOOP
prev = curr;
DROP TABLE IF EXISTS min_ids;
CREATE TABLE min_ids AS
SELECT
a.key,
MIN(c.group_id) AS group_id
FROM
group_ids a
INNER JOIN
rel b
ON
a.key = b.key1
INNER JOIN
group_ids c
ON
b.key2 = c.key
GROUP BY
a.key;

UPDATE
group_ids
SET
group_id = min_ids.group_id
FROM
min_ids
WHERE
group_ids.key = min_ids.key;

SELECT SUM(group_id) INTO curr FROM group_ids;
END LOOP;

DROP TABLE IF EXISTS rel;
DROP TABLE IF EXISTS min_ids;
END
$$;

SELECT * FROM group_ids;

Hope it helps,

Matthew

> I have a collection of relationship rows of the form
>
> Table: graph
> key1 varchar
> key2 varchar
>
> A row of the form ('a','b') indicates that 'a' and 'b' are related.
> The table contains many relationships between keys, forming several
> disjoint sets. All relationships are bi-directional, and both
> directions are present. I.e. the table contains a set of disjoint
> graphs specified as node pairs.
>
> For example the set of values
>
> key1 key2
> ----- -----
> a x
> a y
> b w
> c t
> x a
> y a
> y z
> z y
> t c
> w b
> w d
> d w
>
> defines three disjoint groups of connected keys:
>
> a x y z
> c t
> b w d
>
> What I would like to achieve is a single SQL query that returns
>
> group key
> ----- ---
> 1 a
> 1 x
> 1 y
> 1 z
> 2 c
> 2 t
> 3 b
> 3 w
> 3 d
>
> I don't care about preserving the node-to-node relationships, only
> the group membership for each node.
>
> I've been playing with "WITH RECURSIVE" CTEs but haven't had any
> success. I'm not really sure how to express what I want in SQL, and
> it's not completely clear to me that recursive CTEs will help here.
> Also I'm not sure how to generate the sequence numbers for the groups
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message matt 2014-07-24 01:35:35 Table checksum proposal
Previous Message Adrian Klaver 2014-07-24 00:40:01 Re: event triggers in 9.3.4