Re: estimates for nested loop very wrong?

From: Antti Haapala <antti(dot)haapala(at)iki(dot)fi>
To: joostje(at)komputilo(dot)org
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: estimates for nested loop very wrong?
Date: 2003-04-10 20:50:19
Message-ID: Pine.GSO.4.44.0304102335170.10376-100000@paju.oulu.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql


On Thu, 10 Apr 2003 joostje(at)komputilo(dot)org wrote:

> Je 2003/04/10(4)/12:04, Tom Lane skribis:
> > joostje(at)komputilo(dot)org writes:
> > > Unless I'm mistaken, pg_nstats.n_distinct should be (aproximately) the same as
> > > count(distinct(id)), but it obviously isn't. Also the most_common_freqs
> > > values are about a 100 times higher than in reality, and, even tough about
> > > 900 values of id occur more often than 40 times, in the 'most_common_vals'
> > > list are 7 (of the 10) vals that occur less than 40 times, and the real
> > > top two isn't even represented.
> >
> > Please try increasing the statistics target (see ALTER TABLE) for db.id, then
> > re-analyze and see if the estimates get better. The default setting is
> > 10 --- try 20, 50, 100 to see what happens.
>
> Well, the n_distinct estimates get better, but the cost estimates still
> don't quite add up: `actual cost' is 23.24, cost estimate never gets
> below 49930.
>
> stat.targ n_distinct| correlation cost estimate
> 5 1917 | 0.43189 3621794.92
> 10 1998 | 0.3909 3618363.33
> 20 4330 | -0.247617 1981594.38
> 50 9708 | 0.0762642 975847.15
> 100 14604 | 0.030706 657631.41
> 200 21855 | 0.0446929 204335.70
> 500 39980 | -0.0497829 121000.31
> 1000 29468 | 0.0366528 49930.08
> 1000 29453 | 0.0367673 49954.08
> Table 1: various estimates as a function of statistical target
> actual distinct values: 42226
> actual cost: varies from 5.0 to 27.8
>
> So, the planner still prefers the mergejoin and hashjoin plans, causing
> the select to take tens of seconds (60 for the mergejoin, I beleve), wheras
> the Nested Loop takes only 0.024 seconds:
>
> For example, for the stat.targ=500 run:
> => explain analyse SELECT id from db, tmp0 WHERE valida AND poseda='uea' AND tab='pers' AND tmp0.v0=id ;
> NOTICE: QUERY PLAN:
>
> Nested Loop (cost=0.00..121000.31 rows=28184 width=39) (actual time=1.05..23.24 rows=415 loops=1)
> -> Seq Scan on tmp0 (cost=0.00..20.00 rows=1000 width=32) (actual time=0.22..0.40 rows=29 loops=1)
> -> Index Scan using db_id_idx on db (cost=0.00..120.63 rows=28 width=7) (actual time=0.27..0.75 rows=14 loops=29)
> Total runtime: 23.92 msec
>
> In the above example, tmp0 had 29 values, that correspond to 415 rows in table db.
> Table db has 586157 rows.

Just a thought: is tmp0 analyzed? (1000 rows vs 29). :) Wouldn't this
divide cost estimate by at least 30? (ok... it's still high...).

--
Antti Haapala

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message joostje 2003-04-10 22:20:56 Re: estimates for nested loop very wrong?
Previous Message Tom Lane 2003-04-10 20:47:52 Re: estimates for nested loop very wrong?