From: | The Hermit Hacker <scrappy(at)hub(dot)org> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: appendum: Re: *really* simple select doesn't use indices ... |
Date: | 2001-05-29 14:13:07 |
Message-ID: | Pine.BSF.4.33.0105291110360.82504-100000@mobile.hub.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 29 May 2001, Tom Lane wrote:
> Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> > The column 'zip' is of type text. As such, indices will not be used except
> > in the case when the where clause is WHERE zip ~ '^<text>' for btree
> > indices.
>
> Uh ... nonsense.
Oh good, I was worried there for a sec ... :)
> > On Tue, 29 May 2001, Marc G. Fournier wrote:
> >> globalmatch=# vacuum verbose analyze locations;
> >> NOTICE: --Relation locations--
> >> NOTICE: Pages 1395: Changed 0, reaped 0, Empty 0, New 0; Tup 123571: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 76, MaxLen 124; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.11s/0.00u sec.
>
> >> globalmatch=# explain SELECT location from locations WHERE zip = '80012';
> >> NOTICE: QUERY PLAN:
> >>
> >> Seq Scan on locations (cost=0.00..2939.64 rows=4217 width=16)
>
> Okay, so it thinks that "zip = '80012'" will match 4217 out of 123571
> rows, which is more than enough to drive it to a sequential scan
> (with an average of more than three matched rows on every page of the
> relation, there'd be no I/O savings at all from consulting the index).
>
> Since the real number of matches is only 1, this estimate is obviously
> way off. In 7.1 the estimate is being driven by the frequency of the
> most common value in the column --- what is the most common value? If
> you're lucky, the most common value is a dummy (empty string, maybe)
> that you could replace by NULL with a few simple changes in
> application logic. 7.1 is smart enough to distinguish NULL from real
> data values in its estimates. If you're not lucky, there really are a
> few values that are far more common than average, in which case you're
> stuck unless you want to run development sources. Current sources
> should do a lot better on that kind of data distribution.
Hit it right on the mark:
zip | cnt
-------+-------
| 81403
00210 | 1
00211 | 1
Will look at the code and see what I can do abuot that NULL issue ...
thanks :)
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2001-05-29 14:13:16 | Re: /contrib/unixdate: Broke in cvs tip. |
Previous Message | Oleg Bartunov | 2001-05-29 14:12:20 | haskeytype and index_formtuple |