Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Daniel Kalchev <daniel(at)digsys(dot)bg>, Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem
Date: 1999-06-10 00:44:48
Message-ID: 199906100044.UAA14419@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Yes. This pretty much sums up the problem.

> Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> writes:
> > This certainly explains it. With locale enabled, LIKE does not use
> > indexes because we can't figure out how to do the indexing trick with
> > non-ASCII character sets because we can't figure out the maximum
> > character value for a particular encoding.
>
> We don't actually need the *maximum* character value, what we need is
> to be able to generate a *slightly larger* character value.
>
> For example, what the parser is doing now:
> fld LIKE 'abc%' ==> fld <= 'abc\377'
> is not even really right in ASCII locale, because it will reject a
> data value like 'abc\377x'.
>
> I think what we really want is to generate the "next value of the
> same length" and use a < comparison. In ASCII locale this means
> fld LIKE 'abc%' ==> fld < 'abd'
> which is reliable regardless of what comes after abc in the data.
>
> The trick is to figure out a "next" value without assuming a lot
> about the local character set and collation sequence. I had
> been thinking about a brute force method: generate a string and
> check to see whether strcmp claims it's greater than the original
> or not; if not, increment the last byte and try again. You'd
> also have to be able to back up and increment earlier bytes if
> you maxed out the last one. For example, in French locale,
> fld LIKE 'ab\376%'
> you'd first produce 'ab\377' but discover that it's less than
> 'ab\376' (because \377 is y-dieresis which sorts like 'y').
> Your next try must be 'ac\377' which will succeed.
>
> But I am worried whether this trick will work in multibyte locales ---
> incrementing the last byte might generate an invalid character sequence
> and produce unpredictable results from strcmp. So we need some help
> from someone who knows a lot about collation orders and multibyte
> character representations.
>
> regards, tom lane
>

--
Bruce Momjian | http://www.op.net/~candle
maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1999-06-10 00:51:04 Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem
Previous Message Bruce Momjian 1999-06-10 00:40:28 Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem