Quick Links

Re: handling unconvertible error messages

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	craig(at)2ndquadrant(dot)com
Cc:	peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: handling unconvertible error messages
Date:	2016-07-28 06:52:19
Message-ID:	20160728.155219.130482473.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello,

At Wed, 27 Jul 2016 19:53:01 +0800, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote in <CAMsr+YFL0b1886tMYF9RPeDdpWryG1cr8ew3pYfiXgrJofpHjA(at)mail(dot)gmail(dot)com>
> On 25 July 2016 at 22:43, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com
> > wrote:
>
> > Example: I have a database cluster initialized with --locale=ru_RU.UTF-8
> > (built with NLS). Let's say for some reason, I have client encoding set
> > to LATIN1. All error messages come back like this:
> >
> > test=> select * from notthere;
> > ERROR: character with byte sequence 0xd0 0x9e in encoding "UTF8" has no
> > equivalent in encoding "LATIN1"
> >
> > There is no straightforward way for the client to learn that there is a
> > real error message, but it could not be converted.
> >
> > I think ideally we could make this better in two ways:
> >
> > 1) Send the original error message untranslated. That would require
> > saving the original error message in errmsg(), errdetail(), etc. That
> > would be a lot of work for only the occasional use. But it would also
> > facilitate an occasionally-requested feature of writing untranslated
> > error messages into the server log or the csv log, while sending
> > translated messages to the client (or some variant thereof).
> >
> > 2) Send an indication that there was an encoding problem. Maybe a
> > NOTICE, or an error context? Wiring all this into elog.c looks a bit
> > tricky, however.
> >
> >
> We have a similar problem with the server logs. But there there's also an
> additional problem: if there isn't any character mapping issue we just
> totally ignore text encoding concerns and log in whatever encoding the
> client asked the backend to use into the log files. So log files can be a
> line-by-line mix of UTF-8, ISO-8859-1, and whatever other fun encodings
> someone asks for. There is *no* way to correctly read such a file since
> lines don't have any marking as to their encoding and no tools out there
> support line-by-line differently encoded text files anyway.

Cyrillic messages with such conversion failure looks just as a
series '?' delimited with spaces. The same occurs for Japanese
(or CJK as an integral of similar alphabets), which conatins
(almost) no compatible letters with ASCII characters. We are
sometimes obliged to take a count of '?'s to identify messages
like the following:p

> $ LANG=C postgres
> ?????????: ??????? ?? ???? ?????????: 2016-07-28 14:08:32 JST
> ?????????: ?????? ?? ????????? ???????????????? ?????? ????????
> ?????????: ??????? ?? ?????? ????????? ???????????
> ?????????: ??????? ??????? ??????????? ??????

> I'm not sure how closely it ties in to the issue you mention, but I think
> it's at least related enough to keep in mind while considering the
> client_encoding issue.

The issue this thread stands for is a failure of character code
replacement performed by backend code, and the another is a
gettext(3)'s behavior according to LC_CTYPE.

I think that data in tables *must* follow the specified encoding
and should result in error for incompatible characters, but I
don't think so for messages from PosgreSQL.

We Jpaanse already have such log message at very early of
starting postmaster.

> LOG: データベースシステムは 2016-07-28 14:14:06 JST にシャットダウンしました
> LOG: MultiXact member wraparound protections are now enabled
> LOG: データベースシステムの接続受付準備が整いました。

The reason for the second line is that it just doesn't have
corresponding translation in ja.po. It is far acceptable than the
sequence of question marks shown above.

> I suggest (3) "log the message with unmappable characters masked". Though I
> would definitely like to be able to also send the raw original, along with
> a field indicating the encoding of the original since it won't be the
> client_encoding, since we need some way to get to the info.

So, I don't think this (3) won't do so much for these
languages. I prefer (1) for this issue. Putting aside the log
issue, error system of PostgreSQL is already doing very similar
thing in err_sendstring for error-recursion cases.

It seems possible to add silent fallback for conversion-failure
there.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Re: handling unconvertible error messages at 2016-07-27 11:53:01 from Craig Ringer

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fujii Masao	2016-07-28 07:44:37	Re: pg_replication_origin_xact_reset() and its argument variables
Previous Message	David Fetter	2016-07-28 04:34:51	Re: A Modest Upgrade Proposal