Quick Links

Re: Pre-proposal: unicode normalized text

From:	"Daniel Verite" <daniel(at)manitou-mail(dot)org>
To:	"Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc:	Peter Eisentraut <peter(at)eisentraut(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Pre-proposal: unicode normalized text
Date:	2023-10-17 15:07:40
Message-ID:	f39c5194-14cd-4a5b-822f-382c8b425499@manitou-mail.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Jeff Davis wrote:

> I believe the patch has utility as-is, but I've been brainstorming a
> few more ideas that could build on it:
>
> * Add a per-database option to enforce only storing assigned unicode
> code points.

There's a problem in the fact that the set of assigned code points is
expanding with every Unicode release, which happens about every year.

If we had this option in Postgres 11 released in 2018 it would use
Unicode 11, and in 2023 this feature would reject thousands of code
points that have been assigned since then.

Aside from that, aborting a transaction because there's an
unassigned code point in a string feels like doing too much,
too late.
The programs that want to filter out unwanted code points
do it before they hit the database, client-side.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

In response to

Re: Pre-proposal: unicode normalized text at 2023-10-17 03:32:19 from Jeff Davis

Responses

Re: Pre-proposal: unicode normalized text at 2023-10-17 15:12:28 from Robert Haas
Re: Pre-proposal: unicode normalized text at 2023-10-17 16:32:18 from Jeff Davis
Re: Pre-proposal: unicode normalized text at 2023-11-02 23:17:33 from Nico Williams

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2023-10-17 15:12:28	Re: Pre-proposal: unicode normalized text
Previous Message	Robert Haas	2023-10-17 15:01:44	Re: run pgindent on a regular basis / scripted manner