Quick Links

Re: Pre-proposal: unicode normalized text

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Chapman Flack <chap(at)anastigmatix(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Nico Williams <nico(at)cryptonector(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Pre-proposal: unicode normalized text
Date:	2023-10-04 20:38:15
Message-ID:	a415e27830b8c94cea1b1c4bd60d254f0f397866.camel@j-davis.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, 2023-10-04 at 14:02 -0400, Chapman Flack wrote:
> The SQL standard would have me able to:
>
> CREATE TABLE foo (
> a CHARACTER VARYING CHARACTER SET UTF8,
> b CHARACTER VARYING CHARACTER SET LATIN1
> )
>
> and so on, and write character literals like
>
> _UTF8'Hello, world!' and _LATIN1'Hello, world!'

Is there a use case for that? UTF-8 is able to encode any unicode code
point, it's relatively compact, and it's backwards-compatible with 7-
bit ASCII. If you have a variety of text data in your system (and in
many cases even if not), then UTF-8 seems like the right solution.

Text data encoded 17 different ways requires a lot of bookkeeping in
the type system, and it also requires injecting a bunch of fallible
transcoding operators around just to compare strings.

Regards,
Jeff Davis

In response to

Re: Pre-proposal: unicode normalized text at 2023-10-04 18:02:50 from Chapman Flack

Responses

Re: Pre-proposal: unicode normalized text at 2023-10-04 21:15:06 from Nico Williams
Re: Pre-proposal: unicode normalized text at 2023-10-04 21:32:50 from Chapman Flack

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	James Coleman	2023-10-04 21:01:14	Re: Opportunistically pruning page before update
Previous Message	Robert Haas	2023-10-04 20:18:50	Re: Add annotation syntax to pg_hba.conf entries