Re: Are statistics gathered on function indexes?

From: Ray Ontko <rayo(at)ontko(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ray Ontko <rayo(at)ontko(dot)com>, pgsql-admin(at)postgresql(dot)org
Subject: Re: Are statistics gathered on function indexes?
Date: 2002-06-28 12:40:37
Message-ID: 200206281240.HAA03045@shire.ontko.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Tom,

> > In other words, if I only give one letter, then I might expect
> > to get about 1/10 of the table, and a full scan might make sense.
> > But the cost should continue to decline as I give longer and longer
> > strings, up to the length of the field.
> > Would this be a reasonable improvement to the optimizer?
>
> It's there already; what did you think was making the difference
> between W% and WI% ?

Yes, but the cost doesn't continue to decline if I make the LIKE
more and more restrictive by going from WI% to WIL% to WILL%, etc.
The current approach assumes, perhaps correctly, that with only
one letter, you might as well do a full table scan, but with 2
or more letters, you might as well use an index and make a
reasonable guess at the cost.

The limitation with this approach is that the optimizer is
considering a number of different options if the join includes
a number of tables and constraints. My query suffers from a
one-size-fits-all approach. If the optimizer had a better
guess on cost, it could choose to drive my query using this
index instead of another.

By allowing the cost to decline as the length of the string
increases, we're making the bet that longer strings are more
selective and require fewer random pages to be read, among
other improvements. Note that this would behave badly in
situations where all the values in the index begin with the same
10 characters and the LIKE string is less than 10 characters
long. This is already a problem for the current approach, and
I think could only be solved by adding statistics for the
selectivity of increasingly longer strings.

BTW, I think that this discussion probably belongs in a different
list (e.g., the one for hacking the optimizer). Since I'm not
(yet) planning to jump in to the code, my purpose for raising
the question here is to help me (and others on the list) understand
the capabilities and limitations of indexes and the optimizer so
that we can make better use of what we have currently, and help
identify areas for improvement.

Ray
----------------------------------------------------------------------
Ray Ontko rayo(at)ontko(dot)com Phone 1.765.935.4283 Fax 1.765.962.9788
Ray Ontko & Co. Software Consulting Services http://www.ontko.com/

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2002-06-28 14:01:02 Re: Are statistics gathered on function indexes?
Previous Message Felipe Nascimento 2002-06-28 12:37:01 Re: Transactions