Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
Cc: Daniel Kalchev <daniel(at)digsys(dot)bg>, Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem
Date: 1999-06-09 16:08:52
Message-ID: 29189.928944532@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> writes:
> This certainly explains it. With locale enabled, LIKE does not use
> indexes because we can't figure out how to do the indexing trick with
> non-ASCII character sets because we can't figure out the maximum
> character value for a particular encoding.

We don't actually need the *maximum* character value, what we need is
to be able to generate a *slightly larger* character value.

For example, what the parser is doing now:
fld LIKE 'abc%' ==> fld <= 'abc\377'
is not even really right in ASCII locale, because it will reject a
data value like 'abc\377x'.

I think what we really want is to generate the "next value of the
same length" and use a < comparison. In ASCII locale this means
fld LIKE 'abc%' ==> fld < 'abd'
which is reliable regardless of what comes after abc in the data.

The trick is to figure out a "next" value without assuming a lot
about the local character set and collation sequence. I had
been thinking about a brute force method: generate a string and
check to see whether strcmp claims it's greater than the original
or not; if not, increment the last byte and try again. You'd
also have to be able to back up and increment earlier bytes if
you maxed out the last one. For example, in French locale,
fld LIKE 'ab\376%'
you'd first produce 'ab\377' but discover that it's less than
'ab\376' (because \377 is y-dieresis which sorts like 'y').
Your next try must be 'ac\377' which will succeed.

But I am worried whether this trick will work in multibyte locales ---
incrementing the last byte might generate an invalid character sequence
and produce unpredictable results from strcmp. So we need some help
from someone who knows a lot about collation orders and multibyte
character representations.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message The Hermit Hacker 1999-06-09 16:22:24 Re: [HACKERS] Re: [PORTS] Anyone working on linux Alpha?
Previous Message Tom Lane 1999-06-09 14:48:22 Re: [HACKERS] Re: [PORTS] Anyone working on linux Alpha?