Re: Pre-proposal: unicode normalized text

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pre-proposal: unicode normalized text
Date: 2023-10-11 06:51:27
Message-ID: 3af7e977-660d-4161-85fe-d5f4a205aa3e@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10.10.23 16:02, Robert Haas wrote:
> On Tue, Oct 10, 2023 at 2:44 AM Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
>> Can you restate what this is supposed to be for? This thread appears to
>> have morphed from "let's normalize everything" to "let's check for
>> unassigned code points", but I'm not sure what we are aiming for now.
>
> Jeff can say what he wants it for, but one obvious application would
> be to have the ability to add a CHECK constraint that forbids
> inserting unassigned code points into your database, which would be
> useful if you're worried about forward-compatibility with collation
> definitions that might be extended to cover those code points in the
> future.

I don't see how this would really work in practice. Whether your data
has unassigned code points or not, when the collations are updated to
the next Unicode version, the collations will have a new version number,
and so you need to run the refresh procedure in any case.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-10-11 06:56:13 Re: Pre-proposal: unicode normalized text
Previous Message Zhijie Hou (Fujitsu) 2023-10-11 06:48:44 Add null termination to string received in parallel apply worker