Re: LEFT JOINs takes forever...almost.

From: "Nikolaus Dilger" <nikolaus(at)dilger(dot)cc>
To: niclas(dot)gustafsson(at)codesense(dot)com
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: LEFT JOINs takes forever...almost.
Date: 2002-09-14 03:38:47
Message-ID: 20020913203850.11569.h002.c001.wm@mail.dilger.cc.criticalpath.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Niclas,

You must be pretty impatient if 3 seconds is forever...

Anyway, the reason why the outer join takes so much
longer than the inner join is pretty simple. The
database has to look through many more rows for your
query. The more outer joins you have the worse it
gets. And your explain plan clearly shows that.

Regards
Nikolaus

From the PostgreSQL documentation
LEFT [ OUTER ] JOIN

First, an INNER JOIN is performed. Then, for each row
in T1 that does not satisfy the join condition with any
row in T2, an additional joined row is returned with
null fields in the columns from T2.

Tip: The joined table unconditionally has a row for
each row in T1.

On Fri, 13 September 2002, "Niclas Gustafsson" wrote:

>
> Hi, help needed!
>
> I'm having a slight performace problem here, i'm using
> pg 7.2.1 on a
> Linux(RH.7.2).
> My query that is causing me problems looks like this:
>
> SELECT v.*,vd.vessel_name,vd.start_date as
> vd_start_date,vd.end_date as
> vd_end_date,d.firstname||' '||d.lastname as
> customer_name
> FROM vessel as v
> LEFT JOIN vesseldebtor as vd ON
v.callsign=vd.callsign
> LEFT JOIN comment as c ON v.comment_id=c.comment_id
> LEFT JOIN debtor as d ON vd.customer_id=d.customer_id
> WHERE vd.customer_id = '35034694'
> ORDER BY vd.start_date ASC
>
> As we see there is four tables in the query, the
> attentive reader
> notices that
> I don't select anything from the table comment, but
I'm
> going to do that
> later
> on so please disregard that for now.
>
> The four tables are:
> vessel, vesseldebtor, debtor and comment.
>
> When running a Explain Anaylze on the query this is
> returned:
>
> Sort (cost=24333.42..24333.42 rows=41392 width=157)
> (actual
> time=3181.32..3181.33 rows=3 loops=1)
> -&gt; Hash Join (cost=3083.88..15830.55 rows=41392
> width=157) (actual
> time=2943.43..3181.23 rows=3 loops=1)
> -&gt; Hash Join (cost=1869.92..11545.77
> rows=41392 width=153)
> (actual time=2357.00..2672.09 rows=3 loops=1)
> -&gt; Merge Join (cost=0.00..6737.57
> rows=41392 width=128)
> (actual time=1683.54..1787.73 rows=3 loops=1)
> -&gt; Index Scan using
vessel_pkey on
> vessel v
> (cost=0.00..2498.30 rows=41392 width=95) (actual
> time=0.22..617.74
> rows=41392 loops=1)
> -&gt; Index Scan using
> vesseldebtor_pkey on
> vesseldebtor vd (cost=0.00..3289.35 rows=48368
> width=33) (actual
> time=0.21..495.45 rows=48368 loops=1)
> -&gt; Hash (cost=1504.94..1504.94
> rows=41194 width=25)
> (actual time=408.73..408.73 rows=0 loops=1)
> -&gt; Seq Scan on debtor d
> (cost=0.00..1504.94
> rows=41194 width=25) (actual time=0.06..184.87
> rows=41194 loops=1)
> -&gt; Hash (cost=979.16..979.16 rows=39516
> width=4) (actual
> time=221.50..221.50 rows=0 loops=1)
> -&gt; Seq Scan on comment c
> (cost=0.00..979.16 rows=39516
> width=4) (actual time=0.05..82.59 rows=39516 loops=1)
> Total runtime: 3181.60 msec
>
> Whereas a similar query using INNER joins like this:
>
> EXPLAIN ANALYZE SELECT
v.*,vd.vessel_name,vd.start_date
> as
> vd_start_date,vd.end_date as
> vd_end_date,d.firstname||' '||d.lastname as
> customer_name
> FROM vessel as v, vesseldebtor as vd, comment as c,
> debtor as d
> WHERE
> v.callsign=vd.callsign AND
> v.comment_id=c.comment_id AND
> vd.customer_id=d.customer_id AND
> vd.customer_id = '35034694'
> ORDER BY vd.start_date ASC
>
> And the output:
>
> Sort (cost=54.29..54.29 rows=4 width=157) (actual
> time=1.39..1.39
> rows=3 loops=1)
> -&gt; Nested Loop (cost=0.00..54.25 rows=4
width=157)
> (actual
> time=0.83..1.31 rows=3 loops=1)
> -&gt; Nested Loop (cost=0.00..40.40 rows=4
> width=132) (actual
> time=0.63..1.04 rows=3 loops=1)
> -&gt; Nested Loop (cost=0.00..29.27
rows=4
> width=128)
> (actual time=0.44..0.70 rows=3 loops=1)
> -&gt; Index Scan using
> vd_customer_id_idx on
> vesseldebtor vd (cost=0.00..8.45 rows=4 width=33)
> (actual
> time=0.22..0.31 rows=3 loops=1)
> -&gt; Index Scan using
vessel_pkey on
> vessel v
> (cost=0.00..5.71 rows=1 width=95) (actual
> time=0.11..0.12 rows=1
> loops=3)
> -&gt; Index Scan using comment_pkey on
> comment c
> (cost=0.00..3.05 rows=1 width=4) (actual
> time=0.10..0.10 rows=1 loops=3)
> -&gt; Index Scan using debtor_pkey on debtor
d
> (cost=0.00..3.79
> rows=1 width=25) (actual time=0.07..0.07 rows=1
loops=3)
> Total runtime: 1.64 msec
>
> EXPLAIN
>
> Can someone please explain why the first query is 3000
> times slower?
> Surely there must be a any way to speed it up?
> If the solution is not that trivial, someone please
> tell me so that I
> can describe the columns and indexes more in detail.
>
>
> Regards,
>
> Niclas Gustafsson
> CodeSense AB
>
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo(at)postgresql(dot)org

Browse pgsql-admin by date

  From Date Subject
Next Message shreedhar 2002-09-14 04:27:28 Is it possible to get the return value of trigger function using PHP or any other method.
Previous Message Marc Mitchell 2002-09-13 21:04:55 Re: Monitoring Locks in Databases