Quick Links

Re: Pre-proposal: unicode normalized text

From:	Nico Williams <nico(at)cryptonector(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Isaac Morland <isaac(dot)morland(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Pre-proposal: unicode normalized text
Date:	2023-10-05 19:14:54
Message-ID:	ZR8LLrk9AJVxEFbX@ubby21
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Oct 05, 2023 at 07:31:54AM -0400, Robert Haas wrote:
> [...] On the other hand, to do that in PostgreSQL, we'd need to
> propagate the character set/encoding information into all of the
> places that currently get the typmod and collation, and that is not a
> small number of places. It's a lot of infrastructure for the project
> to carry around for a feature that's probably only going to continue
> to become less relevant.

Text+encoding can be just like bytea with a one- or two-byte prefix
indicating what codeset+encoding it's in. That'd be how to encode
such text values on the wire, though on disk the column's type should
indicate the codeset+encoding, so no need to add a prefix to the value.

Complexity would creep in around when and whether to perform automatic
conversions. The easy answer would be "never, on the server side", but
on the client side it might be useful to convert to/from the locale's
codeset+encoding when displaying to the user or accepting user input.

If there's no automatic server-side codeset/encoding conversions then
the server-side cost of supporting non-UTF-8 text should not be too high
dev-wise -- it's just (famous last words) a generic text type
parameterized by codeset+ encoding type. There would not even be a hard
need for functions for conversions, though there would be demand for
them.

But I agree that if there's no need, there's no need. UTF-8 is great,
and if only all PG users would just switch then there's not much more to
do.

Nico
--

In response to

Re: Pre-proposal: unicode normalized text at 2023-10-05 11:31:54 from Robert Haas

Responses

Re: Pre-proposal: unicode normalized text at 2023-10-05 19:49:37 from Tom Lane
Re: Pre-proposal: unicode normalized text at 2023-10-06 17:33:06 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeff Davis	2023-10-05 19:16:34	Re: Pre-proposal: unicode normalized text
Previous Message	Nathan Bossart	2023-10-05 19:04:53	Re: [PoC/RFC] Multiple passwords, interval expirations