From: | Hannu Krosing <hannu(at)tm(dot)ee> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, drheart(at)wanadoo(dot)es, Lista PostgreSql <pgsql-general(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [HACKERS] Problem (bug?) with like |
Date: | 2001-12-04 17:30:33 |
Message-ID: | 3C0D0839.2010102@tm.ee |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Tom Lane wrote:
>Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>
>>But what about '%A%' vs. '%AC%'. Seems the second is reasonably
>>different from the first the our optimizer may be fine with that. Is it
>>only when the strings get longer that we lose specificity?
>>
>
>Yeah, I don't think that the estimates are bad for one or two
>characters. But the estimate gets real small real fast as you
>increase the number of match characters in the LIKE pattern.
>We need to slow that down some.
>
Could we just assign weights to first few characters and then consider
only these
first few characters when determinind probabbility of finding it ?
If someone searches for '%New York City%' we have quite good reasons to
believe
that there are some of these in there so we should factor in the fact
that usually one searches
for strings that do exist by said weights.
Another option would be to gather statistics not only on individual
letters but on bi- or
trigraphs. Then as a next step we could implement proper trigraph indexes ;)
----------------
Hannu
From | Date | Subject | |
---|---|---|---|
Next Message | wsheldah | 2001-12-04 17:54:44 | Sending Explain output to a perl client |
Previous Message | Tom Lane | 2001-12-04 17:16:27 | Re: Problem (bug?) with like |
From | Date | Subject | |
---|---|---|---|
Next Message | Doug McNaught | 2001-12-04 17:58:47 | Re: java stored procedures |
Previous Message | Tom Lane | 2001-12-04 17:16:27 | Re: Problem (bug?) with like |