Re: Is EXISTS the most efficient approach for PostgreSql to check for existence of nodes in a tree?

From: Seref Arikan <serefarikan(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Is EXISTS the most efficient approach for PostgreSql to check for existence of nodes in a tree?
Date: 2012-05-17 17:40:16
Message-ID: CAG1bHGO-2uXAP_xhL6A1dkUhz6vtW=GvV7qvomaNEgw02mvkHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Trying to reply to Rob:

Apologies if this does not end up in the thread (gmail is just driving me
mad, I can't seem to receive messages, so I've subscribed again)

For some reason Limit 1 cause my query to go on for minutes without a
response, which was not the case.

The following query takes about 10.6 seconds, with a table that has about
20 million rows. First, the indices of the table with create statement:

CREATE TABLE public.path_value (
val_string TEXT,
feature_mapping_id INTEGER NOT NULL,
parent_feature_mapping_id INTEGER,
feature_name TEXT,
rm_type_name TEXT,
path INTEGER NOT NULL,
payload_id INTEGER NOT NULL,
id INTEGER NOT NULL,
ehr_id INTEGER,
CONSTRAINT path_value_pkey PRIMARY KEY(id)
) WITHOUT OIDS;

CREATE INDEX path_value_idx ON public.path_value
USING btree (rm_type_name COLLATE pg_catalog."default", feature_name
COLLATE pg_catalog."default");

CREATE INDEX path_value_idx1 ON public.path_value
USING btree (feature_mapping_id, parent_feature_mapping_id);

CREATE INDEX path_value_idx2 ON public.path_value
USING btree (payload_id);

Now results of EXPLAIN ANALYZE:

"QUERY PLAN"
"Nested Loop Semi Join (cost=543422.34..571013.45 rows=5 width=4) (actual
time=5854.267..10684.798 rows=82003 loops=1)"
" Join Filter: (root.feature_mapping_id =
anodeid.parent_feature_mapping_id)"
" -> Merge Join (cost=543422.34..543426.29 rows=12 width=24) (actual
time=5854.182..5953.531 rows=82003 loops=1)"
" Merge Cond: ((root.feature_mapping_id =
node1.parent_feature_mapping_id) AND (root.payload_id = node2.payload_id))"
" -> Sort (cost=1150.65..1151.35 rows=283 width=12) (actual
time=359.940..380.707 rows=82003 loops=1)"
" Sort Key: root.feature_mapping_id, root.payload_id"
" Sort Method: external merge Disk: 1768kB"
" -> Index Scan using path_value_idx on path_value root
(cost=0.00..1139.12 rows=283 width=12) (actual time=0.073..261.240
rows=82003 loops=1)"
" Index Cond: ((rm_type_name = 'COMPOSITION'::text) AND
(feature_name = 'composition'::text))"
" -> Sort (cost=542271.69..542272.31 rows=247 width=12) (actual
time=5494.236..5516.788 rows=82003 loops=1)"
" Sort Key: node1.parent_feature_mapping_id, node2.payload_id"
" Sort Method: external sort Disk: 2088kB"
" -> HashAggregate (cost=542259.40..542261.87 rows=247
width=12) (actual time=5377.761..5394.453 rows=82003 loops=1)"
" -> Hash Semi Join (cost=506297.37..542253.38
rows=1205 width=12) (actual time=2788.758..5325.838 rows=164004 loops=1)"
" Hash Cond: ((node1.payload_id =
node2.payload_id) AND (node1.feature_mapping_id =
node2.parent_feature_mapping_id))"
" -> Bitmap Heap Scan on path_value node1
(cost=77.61..9421.62 rows=2464 width=12) (actual time=76.628..2430.473
rows=246005 loops=1)"
" Recheck Cond: ((rm_type_name =
'CONTENTITEM'::text) AND (feature_name = 'content'::text))"
" -> Bitmap Index Scan on path_value_idx
(cost=0.00..77.00 rows=2464 width=0) (actual time=75.640..75.640
rows=246005 loops=1)"
" Index Cond: ((rm_type_name =
'CONTENTITEM'::text) AND (feature_name = 'content'::text))"
" -> Hash (cost=498674.38..498674.38 rows=399092
width=8) (actual time=2711.653..2711.653 rows=410015 loops=1)"
" Buckets: 4096 Batches: 16 Memory Usage:
1013kB"
" -> Bitmap Heap Scan on path_value node2
(cost=10345.32..498674.38 rows=399092 width=8) (actual
time=71.718..2607.467 rows=410015 loops=1)"
" Recheck Cond: (rm_type_name =
'ITEMSTRUCTURE'::text)"
" -> Bitmap Index Scan on
path_value_idx (cost=0.00..10245.55 rows=399092 width=0) (actual
time=69.634..69.634 rows=410015 loops=1)"
" Index Cond: (rm_type_name =
'ITEMSTRUCTURE'::text)"
" -> Index Scan using path_value_idx2 on path_value anodeid
(cost=0.00..2298.91 rows=1 width=8) (actual time=0.057..0.057 rows=1
loops=82003)"
" Index Cond: (payload_id = node2.payload_id)"
" Filter: ((feature_name = 'archetypeNodeId'::text) AND (val_string
= 'openEHR-EHR-COMPOSITION.discharge.v1'::text))"
"Total runtime: 10692.277 ms"

and finaly, the query that gave this result:

EXPLAIN ANALYZE SELECT root.id from path_value as root
WHERE
root.rm_type_name = 'COMPOSITION'
AND
root.feature_name = 'composition'
AND
EXISTS (SELECT anodeid.id from path_value as anodeid
WHERE
anodeId.parent_feature_mapping_id =
root.feature_mapping_id
AND
anodeId.payload_id = root.payload_id
AND
anodeId.feature_name = 'archetypeNodeId'
AND
anodeId.val_string =
'openEHR-EHR-COMPOSITION.discharge.v1'
)

AND
EXISTS (SELECT 1 from path_value as node1
WHERE
node1.payload_id = root.payload_id
AND
node1.parent_feature_mapping_id = root.feature_mapping_id
AND
node1.feature_name = 'content'
AND
node1.rm_type_name = 'CONTENTITEM'
AND
EXISTS (SELECT 1 from path_value as node2
WHERE
node2.payload_id = node1.payload_id
AND
node2.parent_feature_mapping_id =
node1.feature_mapping_id
AND
node2.rm_type_name = 'ITEMSTRUCTURE'
)
)

Is there a glaring error in my approach? Should I be better off with
another SQL query, or Ltree/XPATH queries?

Best regards
Seref

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2012-05-17 18:12:15 Re: cannot compile www_fdw Foreign Data Wrapper
Previous Message Adrian Schreyer 2012-05-17 15:45:51 cannot compile www_fdw Foreign Data Wrapper