Re: estimates for nested loop very wrong?

From: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To: <joostje(at)komputilo(dot)org>
Cc: <pgsql-sql(at)postgresql(dot)org>
Subject: Re: estimates for nested loop very wrong?
Date: 2003-04-10 14:17:03
Message-ID: 20030410071328.M79630-100000@megazone23.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Thu, 10 Apr 2003 joostje(at)komputilo(dot)org wrote:

> It really puzzles me why the estimate for the Nested Loop is so bad,
> as it looks like a rather easy thing to estimate...
>
> Below:
> table db has 2.5e6 entries, column "id" has rather evenly distributed values.
> column id has a (btree) index.
> table tmp1 has 23 entries, column "v0" has all unique entries.
> psql version: 7.2.1
>
> Here is the query twice, once with enable_hashjoin ON, second time
> with enable_hashjoin OFF, to force psql to use Nested Loop
>
> ueadb=> explain analyse select id, var, val from db, tmp1 where id=tmp1.v0;
> NOTICE: QUERY PLAN:
>
> Hash Join (cost=1.29..67863.71 rows=61140 width=38) (actual time=4475.26..32442.99 rows=756 loops=1)
> -> Seq Scan on db (cost=0.00..54498.12 rows=2520012 width=31) (actual time=0.07..29170.62 rows=2520012 loops=1)
> -> Hash (cost=1.23..1.23 rows=23 width=7) (actual time=0.25..0.25 rows=0 loops=1)
> -> Seq Scan on tmp1 (cost=0.00..1.23 rows=23 width=7) (actual time=0.17..0.22 rows=23 loops=1)
> Total runtime: 32443.78 msec
>
>
> --Setting hashjoin off, forcing psql to use the Nested Loop
> ueadb=> set enable_hashjoin = off;
>
>
> ueadb=> explain analyse select id, var, val from db, tmp1 where id=tmp1.v0;
> NOTICE: QUERY PLAN:
>
> Nested Loop (cost=0.00..208256.60 rows=61140 width=38) (actual time=0.92..18.49 rows=756 loops=1)
> -> Seq Scan on tmp1 (cost=0.00..1.23 rows=23 width=7) (actual time=0.24..0.39 rows=23 loops=1)
> -> Index Scan using db_id_idx on db (cost=0.00..9021.35 rows=2658 width=31) (actual time=0.32..0.69 rows=33 loops=23)

It seems to be misestimating the number of rows to return on db. That's
probably why the cost is so wrong (it's over estimating by nearly a factor
of 100). Have you analyzed db recently?

> I guess I'll be doing my queries with enable_hashjoin OFF, but is there anythign I'm
> doing wrong?
> (Apart from maybe uzing psql 7.2.1 -- would 7.3 be smarter here?)

It might.

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2003-04-10 14:17:09 Re: estimates for nested loop very wrong?
Previous Message Christoph Haller 2003-04-10 13:21:06 Re: Extraordinary Full Join