> Can someone tell me where we are on this? Tatsuo, I think you said you
> wanted to apply this fix.
I wanted to apply the fix after Chih-Chang Hsieh tested it out. But he
said he couldn't becuase no test data was available for it. However I
and he now are in the same opinion that the fix seems correct, and I
am going to apply the fix, probably by tomorrow.
> > [Cced to hackers list]
> >
> > > > BTW I have found another bug with EUC_TW support. line 917 in conv.c:
> > > >
> > > > *p++ = c1 - LC_CNS11643_3 + 0xa3;
> > > >
> > > > this should be:
> > > >
> > > > *p++ = *mic++ - LC_CNS11643_3 + 0xa3;
> > > >
> > > > Otherwise, CNS 11643-1992 Plane 3 or more won't work. Could you test
> > > > it out with CNS 11643-1992 Plane 3 or more?
> > >
> > > Thanks for your very quickly reply!!
> >
> > You are welcome.
> >
> > > I think you are right, but I have not test it.
> > > Because original Big5 encoding does not contain characters in CNS 11643-1992
> > > Plane 3.
> > > But I will have a chance to test it, we here are seeking the support for Big5E
> > > (an extendied Big5
> > > encoding) in PostgreSQL. Though most people who use PostgresSQL in Taiwan only
> > > cares about
> > > Big5 encoding .
> > >
> > > Would you like to answer some mb related questions for me? I am a newbie :P
> > >
> > > 1.) Because the 2nd byte of Big5 encoding overlaps with ASCII,
> > > such as '\' (this is very bad for many programs to work with Big5).
> >
> > As long as frontend side knows the current client side encoding is
> > Big5, this should be no problem. At least for libpq. It examins the
> > first byte of Big5. If it is greater than 0x7f, then it must be a
> > double byte Hanji. So libpq reads 2 bytes in this case, not matter the
> > second byte is '\'.
> >
> > > For example: If we initdb -E MULE_INTERNAL first,
> > > SET CLIENT_ENCODING TO 'BIG5', and
> > > INSERT INTO some_table VALUES (..., 'the last byte of some Big5 char is
> > > backslash\',...),
> > > then we can not successfully complete this SQL INSERT -- the prompt of psql
> > > changes
> > > but psql does not execute it. If we initdb -E with EUC_TW, it's OK.
> > > Is this is a parsing problem? What's your suggestion for the solution?
> >
> > Hum. initdb -E MULE_INTERNAL should work as well. Let me dig into the
> > problem. It would be nice if you could send me the Big5 data for
> > testing by a private mail.
> >
> > BTW I would not recommend "SET CLIENT_ENCODING TO 'BIG5'" to do an
> > on-the-fly encoding changes. Since in this way, frontend side has no
> > idea what the client encoding is. 7.0.x overcome this problem by
> > introducing new \encoding command. For 6.5 or before I would recommend
> > to use PGCLIENTENCODING environment variable.
> >
> > > 2.) Is using MULE_INTERNAL faster than EUC_TW as backend encoding when
> > > PostgreSQL processing Big5 data? (It seems
> > > BIG5->big52mic()->mic2euc_tw()->EUC_TW
> > > needs 2 code converting procedures, but BIG5->big52mic()->EUC_TW only needs
> > > one from
> > > the mb sources)
> >
> > Yes. But the difference would be very small. The expensive part is a
> > table look-up in big52mic.
> >
> > BTW 7.1 will support automatic encoding conversion between Unicode
> > (UTF-8) and Big5 (or EUC_TW). Try the snapshot if you like.
> >
> > > 3.) Dose PostgreSQL's ODBC driver support mb?
> >
> > I don't think so. For Japanese (EUC_JP/SJIS) Kataoka has made patches
> > to enable MB support in ODBC. It should not be very difficult to
> > support EUC_TW/Big5, I don't know.
> > --
> > Tatsuo Ishii
> >
>
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
> + If your life is a hard drive, | 830 Blythe Avenue
> + Christ can be your backup. | Drexel Hill, Pennsylvania 19026