From: | Nico Williams <nico(at)cryptonector(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | Chapman Flack <chap(at)anastigmatix(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Pre-proposal: unicode normalized text |
Date: | 2023-10-04 23:43:37 |
Message-ID: | ZR34qVfjnQCspu/m@ubby21 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Oct 04, 2023 at 04:01:26PM -0700, Jeff Davis wrote:
> On Wed, 2023-10-04 at 16:15 -0500, Nico Williams wrote:
> > Better that than TEXT blobs w/ the encoding given by the `CREATE
> > DATABASE` or `initdb` default!
>
> From an engineering perspective, yes, per-column encodings would be
> more flexible. But I still don't understand who exactly would use that,
> and why.
Say you have a bunch of text files in different encodings for reasons
(historical). And now say you want to store them in a database so you
can index them and search them. Sure, you could use a filesystem, but
you want an RDBMS. Well, the answer to this is "convert all those files
to UTF-8".
> It would take an awful lot of effort to implement and make the code
> more complex, so we'd really need to see some serious demand for that.
Yes, it's better to just use UTF-8.
The DB could implement conversions to/from other codesets and encodings
for clients that insist on it. Why would clients insist anyways?
Better to do the conversions at the clients.
In the middle its best to just have Unicode, and specifically UTF-8,
then push all conversions to the edges of the system.
Nico
--
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Fredouille | 2023-10-05 00:04:41 | Re: unnest multirange, returned order |
Previous Message | Jeff Davis | 2023-10-04 23:01:26 | Re: Pre-proposal: unicode normalized text |