Re: [HACKERS] Re: locales and MB (was: Postgres 6.5 beta2 and beta3 problem)

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: t-ishii(at)sra(dot)co(dot)jp, Goran Thyni <goran(at)kirra(dot)net>, pgsql-hackers(at)postgreSQL(dot)org, Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
Subject: Re: [HACKERS] Re: locales and MB (was: Postgres 6.5 beta2 and beta3 problem)
Date: 1999-06-11 15:14:55
Message-ID: 199906111514.AAA00712@ext16.sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> writes:
> > Currently the mb support allows serveral internal
> > encodings including Unicode and mule-internal-code.
> > (yes, you can do regexp/like to Unicode data if mb support is
> > enabled).
>
> One of the things that bothers me about makeIndexable() is that it
> doesn't seem to be multibyte-aware; does it really work in MB case?

Yes. This is because I carefully choose multibyte encodings for
the backend that have following characteristics:

o if the 8th bit of a byte is off then it is a ascii character
o otherwise it is part of non ascii multibyte characters

With these assumptions, makeIndexable() works very well with multibyte
chars.

Not all multibyte encodings satisfy above conditions. For example,
SJIS (an encoding for Japanese) and Big5 (for traditional Chinese)
does not satisfies those requirements. In these encodings the first
byte of the double byte is always 8th bit on. However in second byte
sometimes 8th bit is off: this means we cannot distinguish it from
ascii since it may accidentally matches a bit pattern of an ascii
char. This is why I do not allow SJIS and Big5 as the server
encodings. Users can use SJIS and Big5 for the client encoding,
however.

You might ask why I don't make makeIndexable() multibyte-aware. It
definitely possible. But you should know there are many places that
need to be multibyte-aware in this sence. The parser is one of the
good example. Making everything in the backend multibyte-aware is not
worse to do, in my opinion.
---
Tatsuo Ishii

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Lockhart 1999-06-11 15:23:57 Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem
Previous Message Bruce Momjian 1999-06-11 15:00:39 Re: [HACKERS] missing #endif in win32 specific headers