Re: estimates for nested loop very wrong?

From: joostje(at)komputilo(dot)org
To: pgsql-sql(at)postgresql(dot)org
Cc: joostje(at)komputilo(dot)org
Subject: Re: estimates for nested loop very wrong?
Date: 2003-04-10 16:29:02
Message-ID: 20030410162902.GA28537@co.uea.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Je 2003/04/10(4)/10:04, Tom Lane skribis:

> Have you done an ANALYZE or VACUUM ANALYZE recently?

Jes, both, actually, and the `analyse' quite a few times.

> > Nested Loop (cost=0.00..208256.60 rows=61140 width=38) (actual time=0.92..18.49 rows=756 loops=1)

> The planner is evidently estimating that each row of tmp1 will match 2600+
> rows of db, whereas in reality there is only one match. Rather than
> mess with enable_hashjoin, you need to find out why that estimate is so
> badly off. Are the entries in tmp1 specially selected to correspond to
> unique rows of db?

Well, each entry in tmp1 matches with about 7-80 entries in db, but yes
the problem indeed seems to be that the estimate is so far off.
And no, the entries in tmp1 are not specially selected, they correspond
to `normal' values of id in db (values that are about as frequent as
other values).

I have done VACUUM ANALYSE on the table (and drop index; create index db_id_idx on db(id);).
=> analyse db;
=> select n_distinct from pg_stats where tablename='db' and attname='id';
1996
=> select count(distinct(id)) from db;
42225

Unless I'm mistaken, pg_nstats.n_distinct should be (aproximately) the same as
count(distinct(id)), but it obviously isn't. Also the most_common_freqs
values are about a 100 times higher than in reality, and, even tough about
900 values of id occur more often than 40 times, in the 'most_common_vals'
list are 7 (of the 10) vals that occur less than 40 times, and the real
top two isn't even represented.

(BTW, the table I'm using now is a little smaller, as it turned out that
a few (75%) of the entries in db had only 3 different id values. This
didn't have any effect on the accurateness of the estimates, though).

BTW,
=> select count(id) from db;
586035

Thanks!
joostje

-- what pg_stat thinks about db.id:

=> select n_distinct, most_common_vals, most_common_freqs from pg_stats where tablename='db' and attname='id';
n_distinct | most_common_vals | most_common_freqs
------------+-----------------------------------------------------+-------------------------------------------------------------------------------------------------------
1907 | {subo,smys,raha,sjbo,sdai,roal,sooi,stsw,rmwi,snuw} | {0.00733333,0.007,0.00633333,0.00633333,0.006,0.00566667,0.00566667,0.00566667,0.00533333,0.00533333}

--these estimates are far off:

=> select id, count (id)/586035.0 from db where id='subo' or id='smys' or id='raha' or id='sjbo' or id='sdai' or id='roal' or id='sooi' or id='stsw' or id='rmwi' or id='snuw' group by id;
id | ?column?
------+----------------------
raha | 0.000156987210661479
rmwi | 3.24212717670446e-05
roal | 6.3136160809508e-05
sdai | 8.70255189536461e-05
sjbo | 6.3136160809508e-05
smys | 7.5080839881577e-05
snuw | 4.26595681145324e-05
sooi | 0.000114327642546947
stsw | 6.14297780849267e-05
subo | 5.11914817374389e-05

--and these would be the real most_common_freqs:

=> select id, count(id), count(id)/586035.0 from db group by id order by - count(id) limit 10;
id | count | ?column?
--------+-------+----------------------
indmem | 194 | 0.000331038248568771
hton | 97 | 0.000165519124284386
raha | 92 | 0.000156987210661479
simo | 87 | 0.000148455297038573
sugn | 87 | 0.000148455297038573
rjgl | 85 | 0.00014504253158941
hroy | 84 | 0.000143336148864829
jrgv | 84 | 0.000143336148864829
tojo | 83 | 0.000141629766140248
lucy | 82 | 0.000139923383415666

-- the above all done after a 'vacuum analyse';

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2003-04-10 16:44:35 Re: estimates for nested loop very wrong?
Previous Message Franco Bruno Borghesi 2003-04-10 16:12:49 Re: Trigger