Re: [HACKERS] another locale problem

From: Daniel Kalchev <daniel(at)digsys(dot)bg>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] another locale problem
Date: 1999-06-14 07:39:52
Message-ID: 199906140740.KAA17559@dcave.digsys.bg
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom,

So you say that this check prevents the use of indexes, when we use the ~*
operator and have alpha characters in the pattern, because apparently the
index cannot do case insensitive matching.

I was under the (apparently wrong impression) that it was possible to use
indexes for case insensitive matching. Dreaming... :-)

For your information, isalpha() is the correct match for case foldable
characters, at least in the cp1251 (windows-1251) locale. I believe a more
correct test could be to access the locale's MAPLOWER and MAPUPPER tables.

It is not the case in Bulgarian, but there might be languages where an letter
does not exist in both upper and lower cases and therefore requires more
complex handling. Perhaps such situation exists in the multibyte locales.

Please excuse my confusion. :-)

Daniel

>>>Tom Lane said:
> Daniel Kalchev <daniel(at)digsys(dot)bg> writes:
> > In fact, after giving it some though... the expression in gram.y
>
> > (strcmp(opname,"~*")
> > == 0 && isalpha(n->val.val.str[pos])))
>
> > is wrong. The statement in my view decides that a regular expression is no
t
> > indexable if it contains special characters or if it contains non-alpha
> > characters. Therefore, the statement should be written as:
>
> > (strcmp(opname,"~*")
> > == 0 && !isalpha((unsigned char)n->val.val.str[pos])))
>
> No, it's not wrong, at least not in that way! You've missed the point
> entirely. ~* is the *case insensitive* regexp match operator.
> Therefore if I have a pattern like '^abc' it can match anything
> beginning with either 'a' or 'A'. If the index restriction were to
> include the letter 'a' then it would exclude valid matches starting with
> 'A'. The simplest solution, which is what's in makeIndexable(), is
> to exclude case-foldable characters from the index restriction pattern.
> In this particular case you end up getting no index restriction at all,
> but that is indeed what's supposed to happen.
>
> I am not sure that isalpha() is an adequate test for case-foldable
> characters in non-ASCII locales, but inverting it is definitely wrong ;-)
>
> regards, tom lane

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomoaki NISHIYAMA 1999-06-14 07:50:59 libpq/conv.c
Previous Message The Hermit Hacker 1999-06-14 03:41:12 Re: [HACKERS] Beta5 == Tomorrow's Release ..