| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org> |
| Cc: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, john(at)geeknet(dot)com(dot)au, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org |
| Subject: | Re: [PATCHES] UNICODE characters above 0x10000 |
| Date: | 2004-08-07 16:43:20 |
| Message-ID: | 350.1091897000@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers pgsql-patches |
Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org> writes:
> On Sat, 7 Aug 2004, Tatsuo Ishii wrote:
>> Anyway my point is if current specification of Unicode only allows
>> 24-bit range, why we need to allow usage against the specification?
> Is there a specific reason you want to restrict it to 24 bits?
I see several places that have to allocate space on the basis of the
maximum encoded character length possible in the current encoding
(look for uses of pg_database_encoding_max_length). Probably the only
one that's really significant for performance is text_substr(), but
that's enough to be an argument against setting maxmblen higher than
we have to.
It looks to me like supporting 4-byte UTF-8 characters would be enough
to handle the existing range of Unicode codepoints, and that is probably
as much as we want to do.
If I understood what I was reading, this would take several things:
* Remove the "special UTF-8 check" in pg_verifymbstr;
* Extend pg_utf2wchar_with_len and pg_utf_mblen to handle the 4-byte case;
* Set maxmblen to 4 in the pg_wchar_table[] entry for UTF-8.
Are there any other places that would have to change? Would this break
anything? The testing aspect is what's bothering me at the moment.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2004-08-07 16:59:17 | Re: Vacuum Cost Documentation? |
| Previous Message | Bernd Helmle | 2004-08-07 15:34:54 | Backend crashes with notification rule |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2004-08-07 16:53:49 | Re: PITR on Win32 - Archive and Restore Command Strings |
| Previous Message | John Hansen | 2004-08-07 13:40:36 | Re: [PATCHES] UNICODE characters above 0x10000 |