Help interpreting the output of EXPLAIN

From: "Mayers, Philip J" <p(dot)mayers(at)ic(dot)ac(dot)uk>
To: "'pgsql-general(at)postgresql(dot)org'" <pgsql-general(at)postgresql(dot)org>
Subject: Help interpreting the output of EXPLAIN
Date: 2000-12-08 18:19:37
Message-ID: A0F836836670D41183A800508BAF190B35E231@icex1.cc.ic.ac.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

All,

I have an 4-5 table SQL database where I do queries of various expense. The
most expensive query involves a UNION of two inner joins. I'm having trouble
interpreting the output of the EXPLAIN in order to optimise it. Some
problems I'm having:

1) The costs seem to change radically without me taking any action - do I
have to run them on an unloaded machine?
2) VACUUM seemed to massively increase the cost
3) VACUUM ANALYZE decreased it - so, should I always do "VACUUM; VACUUM
ANALYZE;"
4) I'm using LIMIT after the outer join - should I LIMIT the two inner joins
to the same amount, since or will Postgres do this itself?

After a VACUUM ANALYZE and using the default indexing scheme (probably
what's slowing me down) EXPLAIN gives this:

1 Unique (cost=27403.90..27559.07 rows=1034 width=84)
2 -> Sort (cost=27403.90..27403.90 rows=10345 width=84)
3 -> Append (cost=716.18..26447.76 rows=10345 width=84)
4 -> Hash Join (cost=716.18..20480.67 rows=9209 width=72)
5 -> Nested Loop (cost=0.00..18931.96 rows=9209
width=60)
6 -> Seq Scan on host (cost=0.00..211.09
rows=9209 width=42)
7 -> Index Scan using interface_pkey on
interface (cost=0.00..2.02 rows=1 width=18)
8 -> Hash (cost=604.94..604.94 rows=17294 width=12)
9 -> Seq Scan on machine (cost=0.00..604.94
rows=17294 width=12)
10 -> Hash Join (cost=4656.18..5967.08 rows=1136 width=84)
11 -> Seq Scan on machine (cost=0.00..604.94
rows=17294 width=12)
12 -> Hash (cost=4653.34..4653.34 rows=1136
width=72)
13 -> Nested Loop (cost=0.00..4653.34
rows=1136 width=72)
14 -> Nested Loop (cost=0.00..2343.98
rows=1136 width=54)
15 -> Seq Scan on alias
(cost=0.00..26.36 rows=1136 width=36)
16 -> Index Scan using host_pkey on
host (cost=0.00..2.03 rows=1 width=18)
17 -> Index Scan using interface_pkey on
interface (cost=0.00..2.02 rows=1 width=18)

Am I right in thinking that the loop on line 5 is costing most, and what's
costing most under that is the sequential scan on host? That corresponds to
a join of the form "host.mac = interface.mac", so I should hash index the
host.mac column? Unfortunately, mac is of type "macaddr" and this happens:

hdb=> create index host_mac on host using hash (mac);
ERROR: fmgr_info: function 0: cache lookup failed
hdb=>

I'm more used to optimising C than SQL - how can you tell from an explain
which type of index will decrease the cost (or how to restructure your query
to lower the cost). Will postgres order WHERE clauses such that the least
expensive happens first?

Regards,
Phil

+----------------------------------+
| Phil Mayers, Network Support |
| Centre for Computing Services |
| Imperial College |
+----------------------------------+

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2000-12-08 18:38:48 Re: Problems with vacuum
Previous Message The Hermit Hacker 2000-12-08 18:03:54 Re: Indexing for geographic objects?