Quick Links

Re: Pre-proposal: unicode normalized text

From:	Nico Williams <nico(at)cryptonector(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Chapman Flack <chap(at)anastigmatix(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Pre-proposal: unicode normalized text
Date:	2023-10-04 21:15:06
Message-ID:	ZR3V2vmpfxnDw3Q0@ubby21
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Oct 04, 2023 at 01:38:15PM -0700, Jeff Davis wrote:
> On Wed, 2023-10-04 at 14:02 -0400, Chapman Flack wrote:
> > The SQL standard would have me able to:
> >
> > [...]
> > _UTF8'Hello, world!' and _LATIN1'Hello, world!'
>
> Is there a use case for that? UTF-8 is able to encode any unicode code
> point, it's relatively compact, and it's backwards-compatible with 7-
> bit ASCII. If you have a variety of text data in your system (and in
> many cases even if not), then UTF-8 seems like the right solution.
>
> Text data encoded 17 different ways requires a lot of bookkeeping in
> the type system, and it also requires injecting a bunch of fallible
> transcoding operators around just to compare strings.

Better that than TEXT blobs w/ the encoding given by the `CREATE
DATABASE` or `initdb` default!

It'd be a lot _less_ fragile to have all text tagged with an encoding
(indirectly, via its type which then denotes the encoding).

That would be a lot of work, but starting with just a UTF-8 text type
would be an improvement.

Nico
--

In response to

Re: Pre-proposal: unicode normalized text at 2023-10-04 20:38:15 from Jeff Davis

Responses

Re: Pre-proposal: unicode normalized text at 2023-10-04 23:01:26 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Chapman Flack	2023-10-04 21:32:50	Re: Pre-proposal: unicode normalized text
Previous Message	Chapman Flack	2023-10-04 21:05:37	Re: [PATCH] Add CANONICAL option to xmlserialize