From: | mlw <markw(at)mohawksoft(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: SourceForge & Postgres |
Date: | 2000-12-13 00:20:03 |
Message-ID: | 3A36C0B3.ADAF69FE@mohawksoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane wrote:
>
> mlw <markw(at)mohawksoft(dot)com> writes:
> > btw anyone trying this query should use: "attdispersion"
>
> Sorry about that --- I just copied-and-pasted the query from some notes
> that are obsolete as of 7.1...
>
> > cdinfo=# explain select * from ztitles where artistid = 100000220 ;
> > NOTICE: QUERY PLAN:
>
> > Seq Scan on ztitles (cost=0.00..4740.75 rows=3163 width=296)
>
> > And this is with "-o -fs"
>
> > Index Scan using ztitles_artistid_ndx on ztitles (cost=0.00..5915.01
> > rows=3163 width=296)
>
> > attname | attdispersion | starelid | staattnum | staop | stanullfrac | stacommonfrac | stacommonval | staloval | stahival
> > artistid | 0.0477198 | 19274 | 2 | 97 | 0 | 0.149362 | 100050450 | 100000000 | 100055325
>
> The reason why the thing is going for a sequential scan is that
> astonishingly high stacommonfrac statistic. Does artistid 100050450
> really account for 14.9% of all the rows in your table? (Who is that
> anyway? ;-)) If so, a search for artistid 100050450 definitely *should*
> use a sequential scan.
I tested this statement against the database and you are right, about 14
seconds with the index, 4 without.
BTW ID # 100050450 is "Various Artists"
This is sort of a point I was trying to make in previous emails. I think
this situation, and this sort of ratio is far more likely than the
attention it has been given.
In about every project I have used postgres I have run into this. It is
only recently that I have understood what the problem was and how to get
around it (sort of).
This one entry is destroying any intelligent performance we could hope
to attain. As I said, I always see this sort of behavior in some
implementation.
> The problem at hand is estimating the frequency
> of entries for some other artistid, given that we only have this much
> statistical info available. Obviously the stats are insufficient, and
> I hope to do something about that in a release or two, but it ain't
> gonna happen for 7.1. In the meantime, if you've got huge outliers
> like that, you could try reducing the value of NOT_MOST_COMMON_RATIO
> in src/backend/utils/adt/selfuncs.c.
I did some playing with this value, and I can seem to have it
differentiate between 100050450 and anything else.
From | Date | Subject | |
---|---|---|---|
Next Message | mlw | 2000-12-13 00:41:36 | Re: external function proposal for 7.2 |
Previous Message | Nathan Myers | 2000-12-12 23:51:43 | Re: (one more time) Patches with vacuum fixes available . |