Re: estimates for nested loop very wrong?

From: joostje(at)komputilo(dot)org
To: pgsql-sql(at)postgresql(dot)org
Cc: joostje(at)komputilo(dot)org
Subject: Re: estimates for nested loop very wrong?
Date: 2003-04-10 19:45:53
Message-ID: 20030410194553.GB28537@co.uea.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Je 2003/04/10(4)/12:04, Tom Lane skribis:
> joostje(at)komputilo(dot)org writes:
> > Unless I'm mistaken, pg_nstats.n_distinct should be (aproximately) the same as
> > count(distinct(id)), but it obviously isn't. Also the most_common_freqs
> > values are about a 100 times higher than in reality, and, even tough about
> > 900 values of id occur more often than 40 times, in the 'most_common_vals'
> > list are 7 (of the 10) vals that occur less than 40 times, and the real
> > top two isn't even represented.
>
> Please try increasing the statistics target (see ALTER TABLE) for db.id, then
> re-analyze and see if the estimates get better. The default setting is
> 10 --- try 20, 50, 100 to see what happens.

Well, the n_distinct estimates get better, but the cost estimates still
don't quite add up: `actual cost' is 23.24, cost estimate never gets
below 49930.

stat.targ n_distinct| correlation cost estimate
5 1917 | 0.43189 3621794.92
10 1998 | 0.3909 3618363.33
20 4330 | -0.247617 1981594.38
50 9708 | 0.0762642 975847.15
100 14604 | 0.030706 657631.41
200 21855 | 0.0446929 204335.70
500 39980 | -0.0497829 121000.31
1000 29468 | 0.0366528 49930.08
1000 29453 | 0.0367673 49954.08
Table 1: various estimates as a function of statistical target
actual distinct values: 42226
actual cost: varies from 5.0 to 27.8

So, the planner still prefers the mergejoin and hashjoin plans, causing
the select to take tens of seconds (60 for the mergejoin, I beleve), wheras
the Nested Loop takes only 0.024 seconds:

For example, for the stat.targ=500 run:
=> explain analyse SELECT id from db, tmp0 WHERE valida AND poseda='uea' AND tab='pers' AND tmp0.v0=id ;
NOTICE: QUERY PLAN:

Nested Loop (cost=0.00..121000.31 rows=28184 width=39) (actual time=1.05..23.24 rows=415 loops=1)
-> Seq Scan on tmp0 (cost=0.00..20.00 rows=1000 width=32) (actual time=0.22..0.40 rows=29 loops=1)
-> Index Scan using db_id_idx on db (cost=0.00..120.63 rows=28 width=7) (actual time=0.27..0.75 rows=14 loops=29)
Total runtime: 23.92 msec

In the above example, tmp0 had 29 values, that correspond to 415 rows in table db.
Table db has 586157 rows.

The shown select statement is the one used to get all cost estimates
in table 1.

postgresql: 7.2.1 (debian release 3)

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2003-04-10 20:23:48 Re: estimates for nested loop very wrong?
Previous Message Richard Huxton 2003-04-10 17:54:54 Re: Trigger