Re: Slow query with join

From: Marc Watson <mark(dot)watson(at)jurisconcept(dot)ca>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Slow query with join
Date: 2015-03-17 12:59:19
Message-ID: 02F1BBB48ABD3245A3BC519B57CFC8CA7E9E2A80@Exchange.JurisConcept.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>-----Message d'origine-----
>De : Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
>Envoyé : March-16-15 5:07 PM
>À : Tomas Vondra
>Cc : pgsql-general(at)postgresql(dot)org; Marc Watson
>Objet : Re: [GENERAL] Slow query with join
>
>Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>> On 16.3.2015 19:50, Marc Watson wrote:
>>> I hope someone can help me with a problem I'm having when joining a
>>> view with a table. The view is somewhat involved, but I can provide the
>>> details if necessary
>
>> First, get rid of the ORDER BY clauses in the subselects - it's
>> completely pointless, and might prevent proper optimization (e.g.
>> replacing the IN() with optimized joins.
>
>I'm suspicious that the cause may be an ORDER BY in the view. It's
>hard to tell when we've not seen the view definition, but I see that both
>plans we've been shown are going to produce output sorted by actor.id.
>Maybe that's happenstance, or maybe not.
>
> regards, tom lane

Thanks for replying.
First of all, there is an ORDER BY in the view - ORDER BY actor.id. Removing this, the query takes about 47 secs., which is also the same as if I simply do a select * from v_actor, with or without the ORDER BY in the view.
It is something in the view, which I will explore further. If I create a materialized view mv_actor, select * from mv_actor takes 8 sec and my original query using the materialized view takes 16ms.

David G. Johnston <david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
>You might want to consider whether the following is acceptable; but it would depend on the relationship between f_intervenant_ref and v_actor:
>
>SELECT *
>FROM v_actor
>JOIN f_intervenant_ref ON (actor_id = ir_actor_id)
>WHERE ir_dos_id = '5226'

The relationship between f_intervenant_ref and v_actor is simple. ir_actor_id is an integer that contains the value from the table actor.id, a serial and the primary key. This gives the same 7 secs, or 49 secs with out the ORDER BY in the view. The explain analyze output (with the ORDER BY in the view) is :
"Merge Join (cost=54.78..108097.04 rows=8 width=1540) (actual time=6925.158..6962.631 rows=8 loops=1)"
" Merge Cond: (actor.id = f_intervenant_ref.ir_actor_id)"
" -> Merge Left Join (cost=0.85..554262.60 rows=20670 width=138) (actual time=2.840..6960.625 rows=3072 loops=1)"
" Merge Cond: (actor.id = physical_actor.id)"
" -> Merge Left Join (cost=0.57..1665.30 rows=20670 width=114) (actual time=0.014..3.791 rows=3072 loops=1)"
" Merge Cond: (actor.id = moral_actor.id)"
" -> Index Scan using actor_pkey on actor (cost=0.29..1275.50 rows=20670 width=78) (actual time=0.006..1.667 rows=3072 loops=1)"
" -> Index Scan using moral_actor_pkey on moral_actor (cost=0.28..268.78 rows=5548 width=40) (actual time=0.005..0.005 rows=1 loops=1)"
" -> Index Scan using physical_actor_pkey on physical_actor (cost=0.29..725.98 rows=15122 width=28) (actual time=0.003..3.072 rows=3072 loops=1)"
" SubPlan 1"
" -> Limit (cost=8.30..8.31 rows=1 width=8) (never executed)"
" -> Sort (cost=8.30..8.31 rows=1 width=8) (never executed)"
" Sort Key: contact.rank"
" -> Index Scan using contact_actor_idx on contact (cost=0.28..8.29 rows=1 width=8) (never executed)"
" Index Cond: (actor_id = actor.id)"
" SubPlan 2"
" -> Limit (cost=8.30..8.31 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=3072)"
" -> Sort (cost=8.30..8.31 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=3072)"
" Sort Key: contact_1.rank"
" Sort Method: quicksort Memory: 25kB"
" -> Index Scan using contact_actor_idx on contact contact_1 (cost=0.28..8.29 rows=1 width=8) (actual time=0.002..0.003 rows=0 loops=3072)"
" Index Cond: (actor_id = actor.id)"
" -> Index Scan using ir_actor_id_idx on f_intervenant_ref (cost=0.28..62.58 rows=8 width=79) (actual time=0.386..0.449 rows=8 loops=1)"
" Filter: ((ir_dos_id)::text = '5226'::text)"
" Rows Removed by Filter: 891"
"Planning time: 0.864 ms"
"Execution time: 6962.773 ms"

>The other option is to use EXISTS:
>...
Same 7 secs.

A scaled-down version of my view runs quickly, so I'll gradually start inserting the missing pieces until I find the culprit. I'll let you know the cause when I find it.
Thx again,
Mark Watson

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2015-03-17 13:03:19 Re: pgpool + BDR, is it possible?
Previous Message Deole, Pushkar (Pushkar) 2015-03-17 12:33:03 Re: BDR with Postgres