From: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | t-ishii(at)sra(dot)co(dot)jp, Goran Thyni <goran(at)kirra(dot)net>, pgsql-hackers(at)postgreSQL(dot)org, Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
Subject: | Re: [HACKERS] Re: locales and MB (was: Postgres 6.5 beta2 and beta3 problem) |
Date: | 1999-06-11 15:14:55 |
Message-ID: | 199906111514.AAA00712@ext16.sra.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> writes:
> > Currently the mb support allows serveral internal
> > encodings including Unicode and mule-internal-code.
> > (yes, you can do regexp/like to Unicode data if mb support is
> > enabled).
>
> One of the things that bothers me about makeIndexable() is that it
> doesn't seem to be multibyte-aware; does it really work in MB case?
Yes. This is because I carefully choose multibyte encodings for
the backend that have following characteristics:
o if the 8th bit of a byte is off then it is a ascii character
o otherwise it is part of non ascii multibyte characters
With these assumptions, makeIndexable() works very well with multibyte
chars.
Not all multibyte encodings satisfy above conditions. For example,
SJIS (an encoding for Japanese) and Big5 (for traditional Chinese)
does not satisfies those requirements. In these encodings the first
byte of the double byte is always 8th bit on. However in second byte
sometimes 8th bit is off: this means we cannot distinguish it from
ascii since it may accidentally matches a bit pattern of an ascii
char. This is why I do not allow SJIS and Big5 as the server
encodings. Users can use SJIS and Big5 for the client encoding,
however.
You might ask why I don't make makeIndexable() multibyte-aware. It
definitely possible. But you should know there are many places that
need to be multibyte-aware in this sence. The parser is one of the
good example. Making everything in the backend multibyte-aware is not
worse to do, in my opinion.
---
Tatsuo Ishii
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Lockhart | 1999-06-11 15:23:57 | Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem |
Previous Message | Bruce Momjian | 1999-06-11 15:00:39 | Re: [HACKERS] missing #endif in win32 specific headers |