From: | joostje(at)komputilo(dot)org |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | estimates for nested loop very wrong? |
Date: | 2003-04-10 11:21:44 |
Message-ID: | 20030410112144.GA22716@co.uea.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Hi,
When JOINing two tabels (one with 23 entries, one with 2.5e6 entries),
psql estimates the cost of the nested loop method way to high, causing
it to use Hash Join, even though Hash Join actually takes 30 seconds,
agianst 0.020 seconds for Nested Loop.
It really puzzles me why the estimate for the Nested Loop is so bad,
as it looks like a rather easy thing to estimate...
Below:
table db has 2.5e6 entries, column "id" has rather evenly distributed values.
column id has a (btree) index.
table tmp1 has 23 entries, column "v0" has all unique entries.
psql version: 7.2.1
Here is the query twice, once with enable_hashjoin ON, second time
with enable_hashjoin OFF, to force psql to use Nested Loop
ueadb=> explain analyse select id, var, val from db, tmp1 where id=tmp1.v0;
NOTICE: QUERY PLAN:
Hash Join (cost=1.29..67863.71 rows=61140 width=38) (actual time=4475.26..32442.99 rows=756 loops=1)
-> Seq Scan on db (cost=0.00..54498.12 rows=2520012 width=31) (actual time=0.07..29170.62 rows=2520012 loops=1)
-> Hash (cost=1.23..1.23 rows=23 width=7) (actual time=0.25..0.25 rows=0 loops=1)
-> Seq Scan on tmp1 (cost=0.00..1.23 rows=23 width=7) (actual time=0.17..0.22 rows=23 loops=1)
Total runtime: 32443.78 msec
--Setting hashjoin off, forcing psql to use the Nested Loop
ueadb=> set enable_hashjoin = off;
ueadb=> explain analyse select id, var, val from db, tmp1 where id=tmp1.v0;
NOTICE: QUERY PLAN:
Nested Loop (cost=0.00..208256.60 rows=61140 width=38) (actual time=0.92..18.49 rows=756 loops=1)
-> Seq Scan on tmp1 (cost=0.00..1.23 rows=23 width=7) (actual time=0.24..0.39 rows=23 loops=1)
-> Index Scan using db_id_idx on db (cost=0.00..9021.35 rows=2658 width=31) (actual time=0.32..0.69 rows=33 loops=23)
Total runtime: 19.20 msec
I guess I'll be doing my queries with enable_hashjoin OFF, but is there anythign I'm
doing wrong?
(Apart from maybe uzing psql 7.2.1 -- would 7.3 be smarter here?)
From | Date | Subject | |
---|---|---|---|
Next Message | Ries van Twisk | 2003-04-10 12:24:00 | Re: The need to know if a field is using/connected to a sequence |
Previous Message | Olleg Samojlov | 2003-04-10 09:14:55 | Re: create databases with plpgsql |