estimates for nested loop very wrong?

From: joostje(at)komputilo(dot)org
To: pgsql-sql(at)postgresql(dot)org
Subject: estimates for nested loop very wrong?
Date: 2003-04-10 11:21:44
Message-ID: 20030410112144.GA22716@co.uea.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hi,

When JOINing two tabels (one with 23 entries, one with 2.5e6 entries),
psql estimates the cost of the nested loop method way to high, causing
it to use Hash Join, even though Hash Join actually takes 30 seconds,
agianst 0.020 seconds for Nested Loop.

It really puzzles me why the estimate for the Nested Loop is so bad,
as it looks like a rather easy thing to estimate...

Below:
table db has 2.5e6 entries, column "id" has rather evenly distributed values.
column id has a (btree) index.
table tmp1 has 23 entries, column "v0" has all unique entries.
psql version: 7.2.1

Here is the query twice, once with enable_hashjoin ON, second time
with enable_hashjoin OFF, to force psql to use Nested Loop

ueadb=> explain analyse select id, var, val from db, tmp1 where id=tmp1.v0;
NOTICE: QUERY PLAN:

Hash Join (cost=1.29..67863.71 rows=61140 width=38) (actual time=4475.26..32442.99 rows=756 loops=1)
-> Seq Scan on db (cost=0.00..54498.12 rows=2520012 width=31) (actual time=0.07..29170.62 rows=2520012 loops=1)
-> Hash (cost=1.23..1.23 rows=23 width=7) (actual time=0.25..0.25 rows=0 loops=1)
-> Seq Scan on tmp1 (cost=0.00..1.23 rows=23 width=7) (actual time=0.17..0.22 rows=23 loops=1)
Total runtime: 32443.78 msec

--Setting hashjoin off, forcing psql to use the Nested Loop
ueadb=> set enable_hashjoin = off;

ueadb=> explain analyse select id, var, val from db, tmp1 where id=tmp1.v0;
NOTICE: QUERY PLAN:

Nested Loop (cost=0.00..208256.60 rows=61140 width=38) (actual time=0.92..18.49 rows=756 loops=1)
-> Seq Scan on tmp1 (cost=0.00..1.23 rows=23 width=7) (actual time=0.24..0.39 rows=23 loops=1)
-> Index Scan using db_id_idx on db (cost=0.00..9021.35 rows=2658 width=31) (actual time=0.32..0.69 rows=33 loops=23)
Total runtime: 19.20 msec

I guess I'll be doing my queries with enable_hashjoin OFF, but is there anythign I'm
doing wrong?
(Apart from maybe uzing psql 7.2.1 -- would 7.3 be smarter here?)

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Ries van Twisk 2003-04-10 12:24:00 Re: The need to know if a field is using/connected to a sequence
Previous Message Olleg Samojlov 2003-04-10 09:14:55 Re: create databases with plpgsql