Re: Pre-proposal: unicode normalized text

From: Nico Williams <nico(at)cryptonector(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Chapman Flack <chap(at)anastigmatix(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pre-proposal: unicode normalized text
Date: 2023-10-04 23:43:37
Message-ID: ZR34qVfjnQCspu/m@ubby21
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 04, 2023 at 04:01:26PM -0700, Jeff Davis wrote:
> On Wed, 2023-10-04 at 16:15 -0500, Nico Williams wrote:
> > Better that than TEXT blobs w/ the encoding given by the `CREATE
> > DATABASE` or `initdb` default!
>
> From an engineering perspective, yes, per-column encodings would be
> more flexible. But I still don't understand who exactly would use that,
> and why.

Say you have a bunch of text files in different encodings for reasons
(historical). And now say you want to store them in a database so you
can index them and search them. Sure, you could use a filesystem, but
you want an RDBMS. Well, the answer to this is "convert all those files
to UTF-8".

> It would take an awful lot of effort to implement and make the code
> more complex, so we'd really need to see some serious demand for that.

Yes, it's better to just use UTF-8.

The DB could implement conversions to/from other codesets and encodings
for clients that insist on it. Why would clients insist anyways?
Better to do the conversions at the clients.

In the middle its best to just have Unicode, and specifically UTF-8,
then push all conversions to the edges of the system.

Nico
--

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Fredouille 2023-10-05 00:04:41 Re: unnest multirange, returned order
Previous Message Jeff Davis 2023-10-04 23:01:26 Re: Pre-proposal: unicode normalized text