Re: resolved: WITH RECURSIVE: ARRAY[id] All column datatypes must be hashable

From: Will Furnass <will(at)thearete(dot)co(dot)uk>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: resolved: WITH RECURSIVE: ARRAY[id] All column datatypes must be hashable
Date: 2010-10-29 19:31:10
Message-ID: 1288380670846-3242676.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

>> rawi <only4com(at)web(dot)de> writes:
> The Error was caused because I used UNION in place of UNION ALL.
>
> I still don't understand why the ARRAY (path) could not be grouped...

Yeah, it's an unimplemented feature --- there's no hashing support for
arrays. I hope to get that done for 8.5. In the meantime you have
to use UNION ALL there. If you really need to eliminate duplicate
rows, you can do that via DISTINCT in the outer query.

I'm trying to do a similar sort of thing for a network containing ~9000
edges. I'm using a WITH RECURSIVE subquery, an array to track visited edges
and a test to see whether the id of the 'current' edge is already in that
array, as per the examples in the PostgreSQL 9.0 docs. Initially my main
query seemed to run indefinitely so I introduced a LIMIT. I then found that
as the LIMIT was increased the number of non-distinct edges returned by the
query grew at a far greater rate than the number of distinct edges (with
LIMIT 50000 the number of distinct edges returned is only 628). Am I right
in thinking that until arrays can be hashed that this issue could well limit
the size of the networks that I can analyse, given a particular hardware
config? Can anyone think of a way to use a temporary table rather than an
array to store visited entities during a graph traversal as a means for
overcoming this problem?

FYI I've been running the following query as a test on PostgreSQL 9.0.

WITH RECURSIVE upstream_pipes( downstream_end, upstream_end, name, depth,
path, cycle) AS (
SELECT p.down_node, p.up_node, p.name, 1, ARRAY[p.name],
False
FROM pipes_table AS p
WHERE p.downstream_end = '61ESI5R0WC'
UNION ALL
SELECT p.downstream_end, p.upstream_end, p.name,
roi.depth + 1,
path || p.name, p.name = ANY(path)
FROM pipes_table AS p, upstream_pipes AS usp
WHERE p.downstream_end = usp.upstream_end
AND NOT cycle
AND ( NOT ( p.end_1_impassible AND p.node_1 =
p.downstream_end ) )
AND ( NOT ( p.end_2_impassible AND p.node_2 =
p.downstream_end ) )
),
q as (
SELECT name FROM build_upstream_roi limit 20000
)
SELECT COUNT(DISTINCT name), COUNT(name) FROM q

Cheers,

Will
--
View this message in context: http://postgresql.1045698.n5.nabble.com/WITH-RECURSIVE-ARRAY-id-All-column-datatypes-must-be-hashable-tp2154712p3242676.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Viktor Bojović 2010-10-30 21:49:29 large xml database
Previous Message Dmitriy Igrishin 2010-10-29 17:11:31 Re: Using PL/pgSQL text argument in 'IN (INT,INT,...)' clause