Re: slow sub-query problem

From: Tim Dudgeon <tdudgeon(dot)ml(at)gmail(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: slow sub-query problem
Date: 2014-11-18 08:57:23
Message-ID: 546B09F3.90609@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Dave,

thanks for the suggestion. I was trying to work on that basis.
Eventually I got this that works quite well:

SELECT t1.id <http://t1.id>, t1.structure_id, t1.batch_id,
t1.property_id, t1.property_data
FROM chemcentral.structure_props t1
JOIN chemcentral.structure_props t2 ON t1.id <http://t1.id> = t2.id
<http://t2.id>
WHERE t2.structure_id IN (SELECT structure_id FROM
chemcentral.structure_props WHERE property_id = 643413)
AND t1.property_id IN (1, 643413, 1106201)
;

which has this plan.

"Hash Join (cost=4376.38..6539.42 rows=43 width=648) (actual
time=467.265..795.887 rows=381 loops=1)"
" Hash Cond: (t2.id <http://t2.id> = t1.id <http://t1.id>)"
" -> Nested Loop (cost=1092.16..1352.77 rows=507201 width=4) (actual
time=0.807..84.228 rows=173867 loops=1)"
" -> HashAggregate (cost=1091.73..1091.75 rows=2 width=4)
(actual time=0.779..0.897 rows=366 loops=1)"
" Group Key: structure_props.structure_id"
" -> Index Scan using idx_sp_property_id on
structure_props (cost=0.43..1090.77 rows=382 width=4) (actual
time=0.032..0.592 rows=369 loops=1)"
" Index Cond: (property_id = 643413)"
" -> Index Scan using idx_sp_structure_id on structure_props t2
(cost=0.43..127.34 rows=317 width=8) (actual time=0.010..0.172
rows=475 loops=366)"
" Index Cond: (structure_id = structure_props.structure_id)"
" -> Hash (cost=3269.89..3269.89 rows=1146 width=648) (actual
time=464.458..464.458 rows=811892 loops=1)"
" Buckets: 1024 Batches: 32 (originally 1) Memory Usage: 4097kB"
" -> Index Scan using idx_sp_property_id on structure_props t1
(cost=0.44..3269.89 rows=1146 width=648) (actual time=0.033..231.895
rows=811892 loops=1)"
" Index Cond: (property_id = ANY
('{1,643413,1106201}'::integer[]))"
"Planning time: 0.885 ms"

It looks a little strange to me, but it works much better.

Tim

On 17/11/2014 19:19, David Johnston wrote:
> Please reply to the list...
>
> In short...
>
> tablea as t1 join tablea as t2 on t1.id <http://t1.id> = t2.id
> <http://t2.id>
>
> A natural key prevents duplicate real data which a serially generated
> made up key does not.
>
> David J.
>
> On Monday, November 17, 2014, Tim Dudgeon <tdudgeon(dot)ml(at)gmail(dot)com
> <mailto:tdudgeon(dot)ml(at)gmail(dot)com>> wrote:
>
>
> On 17/11/2014 18:44, David G Johnston wrote:
>
> Tim Dudgeon wrote
>
> All relevant columns are indexed and using PostgreSQL 9.4.
> Any clues how to re-write it to avoid the slow sub-query.
>
> Try using an actual join instead of a subquery. You will have
> to provide
> aliases and then setup the where clause appropriately.
>
> I'm trying to go in that direction but in the query is entirely
> within one table, so I need to join the table to itself? I've been
> trying this but not getting it to work yet.
>
>
> I am reading the query correctly in that the repeated
> reference to 643413 is
> redundant?
>
> In this example its sort of redundant, but in a real world case
> the query for structure_id and property_id are independent and may
> have nothing in common.
>
> The lack of a defined natural primary key makes blind reasoning
> difficult.
>
>
> The id column is the primary key.
>
> Tim
>
>
> David J.
>
>
>
>
>
> --
> View this message in context:
> http://postgresql.nabble.com/slow-sub-query-problem-tp5827273p5827275.html
> Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
>
>
>

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message David G Johnston 2014-11-19 01:37:01 Re: slow sub-query problem
Previous Message Tim Dudgeon 2014-11-18 08:53:38 Re: slow sub-query problem