Quick Links

Re: Pre-proposal: unicode normalized text

From:	Nico Williams <nico(at)cryptonector(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Chapman Flack <chap(at)anastigmatix(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Pre-proposal: unicode normalized text
Date:	2023-10-04 23:43:37
Message-ID:	ZR34qVfjnQCspu/m@ubby21
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Oct 04, 2023 at 04:01:26PM -0700, Jeff Davis wrote:
> On Wed, 2023-10-04 at 16:15 -0500, Nico Williams wrote:
> > Better that than TEXT blobs w/ the encoding given by the `CREATE
> > DATABASE` or `initdb` default!
>
> From an engineering perspective, yes, per-column encodings would be
> more flexible. But I still don't understand who exactly would use that,
> and why.

Say you have a bunch of text files in different encodings for reasons
(historical). And now say you want to store them in a database so you
can index them and search them. Sure, you could use a filesystem, but
you want an RDBMS. Well, the answer to this is "convert all those files
to UTF-8".

> It would take an awful lot of effort to implement and make the code
> more complex, so we'd really need to see some serious demand for that.

Yes, it's better to just use UTF-8.

The DB could implement conversions to/from other codesets and encodings
for clients that insist on it. Why would clients insist anyways?
Better to do the conversions at the clients.

In the middle its best to just have Unicode, and specifically UTF-8,
then push all conversions to the edges of the system.

Nico
--

In response to

Re: Pre-proposal: unicode normalized text at 2023-10-04 23:01:26 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Daniel Fredouille	2023-10-05 00:04:41	Re: unnest multirange, returned order
Previous Message	Jeff Davis	2023-10-04 23:01:26	Re: Pre-proposal: unicode normalized text