From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com> |
Cc: | "Kevin McArthur" <Kevin(at)stormtide(dot)ca>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Bad Planner Statistics for Uneven distribution. |
Date: | 2006-07-22 17:03:58 |
Message-ID: | 27058.1153587838@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
"Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com> writes:
> Isn't there any way to make PostgreSQL have a better estimation here:
> -> Index Scan using models_brands_brand on models_brands
> (cost=0.00..216410.97 rows=92372 width=0) (actual time=0.008..0.008
> rows=0 loops=303)
> Index Cond: (brand = $0)
Note that the above plan extract is pretty misleading, because it
doesn't account for the implicit "LIMIT 1" of an EXISTS() clause.
What the planner is *actually* imputing to this plan is 216410.97/92372
cost units, or about 2.34. However that applies to the seqscan variant
as well.
I think the real issue with Kevin's example is that when doing an
EXISTS() on a brand_id that doesn't actually exist in the table, the
seqscan plan has worst-case behavior (ie, scan the whole table) while
the indexscan plan still manages to be cheap. Because his brands table
has so many brand_ids that aren't in the table, that case dominates the
results. Not sure how we could factor that risk into the cost
estimates. The EXISTS code could probably special-case it reasonably
well for the simplest seqscan and indexscan subplans, but I don't see
what to do with more general subqueries (like joins).
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Craig A. James | 2006-07-22 17:26:53 | Re: Forcing using index instead of sequential scan? |
Previous Message | Tom Lane | 2006-07-22 16:22:21 | Re: Forcing using index instead of sequential scan? |