| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
|---|---|
| To: | Anders Hermansen <anders(at)yoyo(dot)no> | 
| Cc: | Guillaume Cottenceau <gc(at)mnc(dot)ch>, pgsql-jdbc(at)postgresql(dot)org | 
| Subject: | Re: ERROR: could not convert UTF-8 character 0x00ef to ISO8859-1 possiblesolution | 
| Date: | 2005-04-28 13:58:22 | 
| Message-ID: | 10942.1114696702@sss.pgh.pa.us | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-jdbc | 
Anders Hermansen <anders(at)yoyo(dot)no> writes:
> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
>> Looking at the source code, it's clear that it's reporting just the
>> first byte of the sequence; the 00 is redundant and probably shouldn't
>> be in the message.
> Yes. Maybe the error messages can be changed so that what actually went
> wrong is more clear? And possibly printing the whole 3-byte sequence?
Any volunteers for that?  The specific message in question is in
src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
        else if ((c & 0xe0) == 0xe0)
            elog(ERROR, "could not convert UTF8 character 0x%04x to ISO8859-1",
                 c);
Aside from being unhelpful as to the exact input data, this is wrong in
another way: it ought to be an ereport() not elog(), because it's
certainly not a can't-happen kind of error.
A little bit of grepping turns up a number of similarly deficient
elog and ereport calls in the src/backend/utils/mb/ tree.
There is more useful code for constructing a character description in
pg_verifymbstr() in src/backend/utils/mb/wchar.c.  Probably what ought
to happen is to split out a small subroutine along the lines of
	char *describe_mb_char(const unsigned char *mbstr, int len)
(returning a palloc'd string "0x....") and then make all the places
that complain about bad multibyte input use it.
Don't have time to deal with it myself, but it seems like a pretty easy
project for anyone wanting to dip their toes in the backend.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2005-04-28 14:11:37 | Re: Statement Timeout and Locking | 
| Previous Message | Roland Walter | 2005-04-28 13:27:49 | Re: Info about driver |