From: | Chapman Flack <chap(at)anastigmatix(dot)net> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Nico Williams <nico(at)cryptonector(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Pre-proposal: unicode normalized text |
Date: | 2023-10-04 18:02:50 |
Message-ID: | 43864caecd65d02081d623d05ccc1683@anastigmatix.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2023-10-04 13:47, Robert Haas wrote:
> On Wed, Oct 4, 2023 at 1:27 PM Nico Williams <nico(at)cryptonector(dot)com>
> wrote:
>> A UTEXT type would be helpful for specifying that the text must be
>> Unicode (in which transform?) even if the character data encoding for
>> the database is not UTF-8.
>
> That's actually pretty thorny ... because right now client_encoding
> specifies the encoding to be used for all data sent to the client. So
> would we convert the data from UTF8 to the selected client encoding?
The SQL standard would have me able to:
CREATE TABLE foo (
a CHARACTER VARYING CHARACTER SET UTF8,
b CHARACTER VARYING CHARACTER SET LATIN1
)
and so on, and write character literals like
_UTF8'Hello, world!' and _LATIN1'Hello, world!'
and have those columns and data types independently contain what
they can contain, without constraints imposed by one overall
database encoding.
Obviously, we're far from being able to do that. But should it
become desirable to get closer, would it be worthwhile to also
try to follow how the standard would have it look?
Clearly, part of the job would involve making the wire protocol
able to transmit binary values and identify their encodings.
Regards,
-Chap
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2023-10-04 18:05:58 | Re: Pre-proposal: unicode normalized text |
Previous Message | Robert Haas | 2023-10-04 17:47:40 | Re: Pre-proposal: unicode normalized text |