Is EXISTS the most efficient approach for PostgreSql to check for existence of nodes in a tree?

From: Seref Arikan <serefarikan(at)kurumsalteknoloji(dot)com>
To: PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Is EXISTS the most efficient approach for PostgreSql to check for existence of nodes in a tree?
Date: 2012-05-17 09:06:26
Message-ID: CA+4ThdrLephM9kTJr_9icrr4Fy7qvRAzoV=S6xrxkDkKC_=esA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have a key value table in my Postgresql db, which represents hierarchical
data through parent_feature_mapping column that points to id of
feature_mapping_id column of the same table.

I need to select root nodes that has children which satisfy various
conditions. The conditions may extend to children of children, so I'm
trying to find roots of trees that contain paths that satisfy the given
constraints.

An example is finding the trees where the root node has type 'COMPOSITION'
and root node's archetypeNodeId attribute has value
'openEHR-EHR-COMPOSITION.discharge.v1' another constraint is root node
having a child of type 'CONTENTITEM' that in turn has a child of type
'ITEMSTRUCTURE'
All nodes in a tree have the same payload Id. The fastest query that I
could write so far is given below.

SELECT root.id from path_value as root
WHERE
root.rm_type_name = 'COMPOSITION'
AND
root.feature_name = 'composition'
AND
EXISTS (SELECT 1 from path_value as anodeid
WHERE
anodeId.parent_feature_mapping_id =
root.feature_mapping_id
AND
anodeId.payload_id = root.payload_id
AND
anodeId.feature_name = 'archetypeNodeId'
AND
anodeId.val_string =
'openEHR-EHR-COMPOSITION.discharge.v1'
LIMIT 1
)

AND
EXISTS (SELECT 1 from path_value as node1
WHERE
node1.payload_id = root.payload_id
AND
node1.parent_feature_mapping_id = root.feature_mapping_id
AND
node1.feature_name = 'content'
AND
node1.rm_type_name = 'CONTENTITEM'
AND
EXISTS (SELECT 1 from path_value as node2
WHERE
node2.payload_id = node1.payload_id
AND
node2.parent_feature_mapping_id =
node1.feature_mapping_id
AND
node2.rm_type_name = 'ITEMSTRUCTURE'
LIMIT 1)
LIMIT 1)

My question is: is this the best approach in terms of performance? This is
an attempt to identify XML payloads that fit certain criteria. I have also
considered using an ltree column that will contain the tree in a from that
I can query as an alternative to sql based method, or I can use xpath
queries on XML payload.

The create statement for my table is as follows:

CREATE TABLE public.path_value (
val_string TEXT,
feature_mapping_id INTEGER NOT NULL,
parent_feature_mapping_id INTEGER,
feature_name TEXT,
rm_type_name TEXT,
path INTEGER NOT NULL,
payload_id INTEGER NOT NULL,
id INTEGER NOT NULL,
ehr_id INTEGER,
CONSTRAINT path_value_pkey PRIMARY KEY(id)
) WITHOUT OIDS;

Best regards
Seref

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Rob Sargentg 2012-05-17 14:19:38 Re: Is EXISTS the most efficient approach for PostgreSql to check for existence of nodes in a tree?
Previous Message Sumit Raja 2012-05-17 08:32:53 Re: Postgres 9.0 Streaming Replication and Load Balancing?