From: | "MauMau" <maumau307(at)gmail(dot)com> |
---|---|
To: | "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Tatsuo Ishii" <ishii(at)postgresql(dot)org> |
Cc: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Boguk, Maksym" <maksymb(at)fast(dot)au(dot)fujitsu(dot)com>, "Heikki Linnakangas" <hlinnakangas(at)vmware(dot)com>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: UTF8 national character data type support WIP patch and list of open issues. |
Date: | 2013-09-21 00:36:20 |
Message-ID: | 75E3DA50779B4FB79FB3DFC0211E8F8A@maumau |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
> On Thu, Sep 19, 2013 at 7:58 PM, Tatsuo Ishii <ishii(at)postgresql(dot)org>
> wrote:
>> What about limiting to use NCHAR with a database which has same
>> encoding or "compatible" encoding (on which the encoding conversion is
>> defined)? This way, NCHAR text can be automatically converted from
>> NCHAR to the database encoding in the server side thus we can treat
>> NCHAR exactly same as CHAR afterward. I suppose what encoding is used
>> for NCHAR should be defined in initdb time or creation of the database
>> (if we allow this, we need to add a new column to know what encoding
>> is used for NCHAR).
>>
>> For example, "CREATE TABLE t1(t NCHAR(10))" will succeed if NCHAR is
>> UTF-8 and database encoding is UTF-8. Even succeed if NCHAR is
>> SHIFT-JIS and database encoding is UTF-8 because there is a conversion
>> between UTF-8 and SHIFT-JIS. However will not succeed if NCHAR is
>> SHIFT-JIS and database encoding is ISO-8859-1 because there's no
>> conversion between them.
>
> I think the point here is that, at least as I understand it, encoding
> conversion and sanitization happens at a very early stage right now,
> when we first receive the input from the client. If the user sends a
> string of bytes as part of a query or bind placeholder that's not
> valid in the database encoding, it's going to error out before any
> type-specific code has an opportunity to get control. Look at
> textin(), for example. There's no encoding check there. That means
> it's already been done at that point. To make this work, someone's
> going to have to figure out what to do about *that*. Until we have a
> sketch of what the design for that looks like, I don't see how we can
> credibly entertain more specific proposals.
OK, I see your point. Let's consider that design. I'll learn the code
regarding this. Does anybody, especially Tatsuo san, Tom san, Peter san,
have any good idea?
Regards
MauMau
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2013-09-21 00:48:39 | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
Previous Message | MauMau | 2013-09-21 00:32:27 | Re: UTF8 national character data type support WIP patch and list of open issues. |