> - corrects a bit the UTF-8 code from Tatsuo to allow Unicode 3.1
> characters (characters with values >= 0x10000, which are encoded on
> four bytes).
After applying your patches, do the 4-bytes UTF-8 convert to UCS-2 (2
bytes) or UCS-4 (4 bytes) in pg_utf2wchar_with_len()? If it were 4
bytes, we are in trouble. Current regex implementaion does not handle
4 byte width charsets.
--
Tatsuo Ishii