slow sub-query problem

From: Tim Dudgeon <tdudgeon(dot)ml(at)gmail(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: slow sub-query problem
Date: 2014-11-17 18:34:34
Message-ID: 546A3FBA.9020901@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I'm having problems optimising a query that's very slow due to a sub-query.
The query is this:

SELECT
structure_id, batch_id, property_id, property_data
FROM chemcentral.structure_props
WHERE structure_id IN (SELECT structure_id FROM
chemcentral.structure_props WHERE property_id = 643413)
AND property_id IN (1, 643413, 1106201);

and it takes 18s to execute.
It I replace the sub-query with the inlined 369 values so that the 4th
line looks like this:

WHERE structure_id IN (1122687,309004,306064 ...)

it takes a few ms.

The plans are:
1. sub-query

"Nested Loop (cost=1132.97..1182.28 rows=43 width=644) (actual
time=70.926..18937.669 rows=381 loops=1)"
" -> HashAggregate (cost=1091.73..1091.75 rows=2 width=4) (actual
time=2.829..3.212 rows=366 loops=1)"
" Group Key: structure_props_1.structure_id"
" -> Index Scan using idx_sp_property_id on structure_props
structure_props_1 (cost=0.43..1090.77 rows=382 width=4) (actual
time=0.033..2.380 rows=369 loops=1)"
" Index Cond: (property_id = 643413)"
" -> Bitmap Heap Scan on structure_props (cost=41.24..45.26 rows=1
width=644) (actual time=51.726..51.727 rows=1 loops=366)"
" Recheck Cond: ((structure_id = structure_props_1.structure_id)
AND (property_id = ANY ('{1,643413,1106201}'::integer[])))"
" Heap Blocks: exact=381"
" -> BitmapAnd (cost=41.24..41.24 rows=1 width=0) (actual
time=51.714..51.714 rows=0 loops=366)"
" -> Bitmap Index Scan on idx_sp_structure_id
(cost=0.00..6.80 rows=317 width=0) (actual time=0.046..0.046 rows=475
loops=366)"
" Index Cond: (structure_id =
structure_props_1.structure_id)"
" -> Bitmap Index Scan on idx_sp_property_id
(cost=0.00..33.90 rows=1146 width=0) (actual time=51.656..51.656
rows=811892 loops=366)"
" Index Cond: (property_id = ANY
('{1,643413,1106201}'::integer[]))"
"Planning time: 0.497 ms"
"Execution time: 18937.868 ms"

2. inlined values

"Bitmap Heap Scan on structure_props (cost=2600.48..2645.29 rows=10
width=644) (actual time=71.676..72.724 rows=381 loops=1)"
" Recheck Cond: ((property_id = ANY ('{1,643413,1106201}'::integer[]))
AND (structure_id = ANY
('{1122687,309004,306064,278852,234066,1122645,412925,280033,423990,568929,448302,278487,278955,40430,40430,467979,467508,288413,289746,306073,355352,265583,4779
(...)"
" Heap Blocks: exact=381"
" -> BitmapAnd (cost=2600.48..2600.48 rows=10 width=0) (actual
time=71.608..71.608 rows=0 loops=1)"
" -> Bitmap Index Scan on idx_sp_property_id (cost=0.00..33.90
rows=1146 width=0) (actual time=54.614..54.614 rows=811892 loops=1)"
" Index Cond: (property_id = ANY
('{1,643413,1106201}'::integer[]))"
" -> Bitmap Index Scan on idx_sp_structure_id
(cost=0.00..2566.32 rows=117367 width=0) (actual time=14.487..14.487
rows=173867 loops=1)"
" Index Cond: (structure_id = ANY
('{1122687,309004,306064,278852,234066,1122645,412925,280033,423990,568929,448302,278487,278955,40430,40430,467979,467508,288413,289746,306073,355352,265583,477941,326652,326602,233964,15338,397586,1122647,3088
(...)"
"Planning time: 1.052 ms"
"Execution time: 72.858 ms"

Table is like this:

CREATE TABLE chemcentral.structure_props
(
id serial NOT NULL,
source_id integer NOT NULL,
structure_id integer NOT NULL,
batch_id character varying(16),
parent_id integer,
property_id integer NOT NULL,
property_data jsonb,
CONSTRAINT structure_props_pkey PRIMARY KEY (id)
)

All relevant columns are indexed and using PostgreSQL 9.4.
Any clues how to re-write it to avoid the slow sub-query.

Many thanks
Tim

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message David G Johnston 2014-11-17 18:44:46 Re: slow sub-query problem
Previous Message Seb 2014-11-14 17:04:52 Re: matching against start/end times and diagnostic values