Re: UTF8 national character data type support WIP patch and list of open issues.

From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Tatsuo Ishii" <ishii(at)postgresql(dot)org>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Boguk, Maksym" <maksymb(at)fast(dot)au(dot)fujitsu(dot)com>, "Heikki Linnakangas" <hlinnakangas(at)vmware(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UTF8 national character data type support WIP patch and list of open issues.
Date: 2013-09-21 00:36:20
Message-ID: 75E3DA50779B4FB79FB3DFC0211E8F8A@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
> On Thu, Sep 19, 2013 at 7:58 PM, Tatsuo Ishii <ishii(at)postgresql(dot)org>
> wrote:
>> What about limiting to use NCHAR with a database which has same
>> encoding or "compatible" encoding (on which the encoding conversion is
>> defined)? This way, NCHAR text can be automatically converted from
>> NCHAR to the database encoding in the server side thus we can treat
>> NCHAR exactly same as CHAR afterward. I suppose what encoding is used
>> for NCHAR should be defined in initdb time or creation of the database
>> (if we allow this, we need to add a new column to know what encoding
>> is used for NCHAR).
>>
>> For example, "CREATE TABLE t1(t NCHAR(10))" will succeed if NCHAR is
>> UTF-8 and database encoding is UTF-8. Even succeed if NCHAR is
>> SHIFT-JIS and database encoding is UTF-8 because there is a conversion
>> between UTF-8 and SHIFT-JIS. However will not succeed if NCHAR is
>> SHIFT-JIS and database encoding is ISO-8859-1 because there's no
>> conversion between them.
>
> I think the point here is that, at least as I understand it, encoding
> conversion and sanitization happens at a very early stage right now,
> when we first receive the input from the client. If the user sends a
> string of bytes as part of a query or bind placeholder that's not
> valid in the database encoding, it's going to error out before any
> type-specific code has an opportunity to get control. Look at
> textin(), for example. There's no encoding check there. That means
> it's already been done at that point. To make this work, someone's
> going to have to figure out what to do about *that*. Until we have a
> sketch of what the design for that looks like, I don't see how we can
> credibly entertain more specific proposals.

OK, I see your point. Let's consider that design. I'll learn the code
regarding this. Does anybody, especially Tatsuo san, Tom san, Peter san,
have any good idea?

Regards
MauMau

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2013-09-21 00:48:39 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Previous Message MauMau 2013-09-21 00:32:27 Re: UTF8 national character data type support WIP patch and list of open issues.