Quick Links

Re: Pre-proposal: unicode normalized text

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Pre-proposal: unicode normalized text
Date:	2023-10-17 15:12:28
Message-ID:	CA+TgmoYOVdnNL+2B+hizoz0Pgx1k7r_VOfeLp2goJz92NOAhEw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Oct 17, 2023 at 11:07 AM Daniel Verite <daniel(at)manitou-mail(dot)org> wrote:
> There's a problem in the fact that the set of assigned code points is
> expanding with every Unicode release, which happens about every year.
>
> If we had this option in Postgres 11 released in 2018 it would use
> Unicode 11, and in 2023 this feature would reject thousands of code
> points that have been assigned since then.

Are code points assigned from a gapless sequence? That is, is the
implementation of codepoint_is_assigned(char) just 'codepoint <
SOME_VALUE' and SOME_VALUE increases over time?

If so, we could consider having a function that lets you specify the
bound as an input parameter. But whether anyone would use it, or know
how to set that input parameter, is questionable. The real issue here
is whether you can figure out which of the code points that you could
put into the database already have collation definitions.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Re: Pre-proposal: unicode normalized text at 2023-10-17 15:07:40 from Daniel Verite

Responses

Re: Pre-proposal: unicode normalized text at 2023-10-17 15:38:07 from Isaac Morland

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2023-10-17 15:15:51	Re: run pgindent on a regular basis / scripted manner
Previous Message	Daniel Verite	2023-10-17 15:07:40	Re: Pre-proposal: unicode normalized text