Can someone tell me where we are on this? Tatsuo, I think you said you
wanted to apply this fix.
> [Cced to hackers list]
>
> > > BTW I have found another bug with EUC_TW support. line 917 in conv.c:
> > >
> > > *p++ = c1 - LC_CNS11643_3 + 0xa3;
> > >
> > > this should be:
> > >
> > > *p++ = *mic++ - LC_CNS11643_3 + 0xa3;
> > >
> > > Otherwise, CNS 11643-1992 Plane 3 or more won't work. Could you test
> > > it out with CNS 11643-1992 Plane 3 or more?
> >
> > Thanks for your very quickly reply!!
>
> You are welcome.
>
> > I think you are right, but I have not test it.
> > Because original Big5 encoding does not contain characters in CNS 11643-1992
> > Plane 3.
> > But I will have a chance to test it, we here are seeking the support for Big5E
> > (an extendied Big5
> > encoding) in PostgreSQL. Though most people who use PostgresSQL in Taiwan only
> > cares about
> > Big5 encoding .
> >
> > Would you like to answer some mb related questions for me? I am a newbie :P
> >
> > 1.) Because the 2nd byte of Big5 encoding overlaps with ASCII,
> > such as '\' (this is very bad for many programs to work with Big5).
>
> As long as frontend side knows the current client side encoding is
> Big5, this should be no problem. At least for libpq. It examins the
> first byte of Big5. If it is greater than 0x7f, then it must be a
> double byte Hanji. So libpq reads 2 bytes in this case, not matter the
> second byte is '\'.
>
> > For example: If we initdb -E MULE_INTERNAL first,
> > SET CLIENT_ENCODING TO 'BIG5', and
> > INSERT INTO some_table VALUES (..., 'the last byte of some Big5 char is
> > backslash\',...),
> > then we can not successfully complete this SQL INSERT -- the prompt of psql
> > changes
> > but psql does not execute it. If we initdb -E with EUC_TW, it's OK.
> > Is this is a parsing problem? What's your suggestion for the solution?
>
> Hum. initdb -E MULE_INTERNAL should work as well. Let me dig into the
> problem. It would be nice if you could send me the Big5 data for
> testing by a private mail.
>
> BTW I would not recommend "SET CLIENT_ENCODING TO 'BIG5'" to do an
> on-the-fly encoding changes. Since in this way, frontend side has no
> idea what the client encoding is. 7.0.x overcome this problem by
> introducing new \encoding command. For 6.5 or before I would recommend
> to use PGCLIENTENCODING environment variable.
>
> > 2.) Is using MULE_INTERNAL faster than EUC_TW as backend encoding when
> > PostgreSQL processing Big5 data? (It seems
> > BIG5->big52mic()->mic2euc_tw()->EUC_TW
> > needs 2 code converting procedures, but BIG5->big52mic()->EUC_TW only needs
> > one from
> > the mb sources)
>
> Yes. But the difference would be very small. The expensive part is a
> table look-up in big52mic.
>
> BTW 7.1 will support automatic encoding conversion between Unicode
> (UTF-8) and Big5 (or EUC_TW). Try the snapshot if you like.
>
> > 3.) Dose PostgreSQL's ODBC driver support mb?
>
> I don't think so. For Japanese (EUC_JP/SJIS) Kataoka has made patches
> to enable MB support in ODBC. It should not be very difficult to
> support EUC_TW/Big5, I don't know.
> --
> Tatsuo Ishii
>
--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026