Re: Pre-proposal: unicode normalized text

From: Isaac Morland <isaac(dot)morland(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Nico Williams <nico(at)cryptonector(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pre-proposal: unicode normalized text
Date: 2023-10-06 19:15:16
Message-ID: CAMsGm5c86VfCeqJe-2O32ph7RLEJ0xVL3XRahPTg6YJcxahzLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 6 Oct 2023 at 15:07, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> On Fri, 2023-10-06 at 13:33 -0400, Robert Haas wrote:
> > What I think people really want is a whole column in
> > some encoding that isn't the normal one for that database.
>
> Do people really want that? I'd be curious to know why.
>
> A lot of modern projects are simply declaring UTF-8 to be the "one true
> way". I am not suggesting that we do that, but it seems odd to go in
> the opposite direction and have greater flexibility for many encodings.
>

And even if they want it, we can give it to them when we send/accept the
data from the client; just because they want to store ISO-8859-1 doesn't
mean the actual bytes on the disk need to be that. And by "client" maybe I
mean the client end of the network connection, and maybe I mean the program
that is calling in to libpq.

If they try to submit data that cannot possibly be encoded in the stated
encoding because the bytes they submit don't correspond to any string in
that encoding, then that is unambiguously an error, just as trying to put
February 30 in a date column is an error.

Is there a single other data type where anybody is even discussing letting
the client tell us how to write the data on disk?

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-10-06 19:26:31 Re: [PoC/RFC] Multiple passwords, interval expirations
Previous Message Jeff Davis 2023-10-06 19:07:17 Re: Pre-proposal: unicode normalized text