Quick Links

Re: Built-in CTYPE provider

From:	Peter Eisentraut <peter(at)eisentraut(dot)org>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Built-in CTYPE provider
Date:	2024-03-22 14:51:49
Message-ID:	49f18979-bef5-4cf0-bf3d-8a5ed323f470@eisentraut.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 21.03.24 01:13, Jeff Davis wrote:
>> Are there any test cases that illustrate the word boundary changes in
>> patch 0005? It might be useful to test those against Oracle as well.
> The tests include initcap('123abc') which is '123abc' in the PG_C_UTF8
> collation vs '123Abc' in PG_UNICODE_FAST.
>
> The reason for the latter behavior is that the Unicode Default Case
> Conversion algorithm for toTitlecase() advances to the next Cased
> character before mapping to titlecase, and digits are not Cased. ICU
> has a configurable adjustment, and defaults in a way that produces
> '123abc'.

I think this might be too big of a compatibility break. So far,
initcap('123abc') has always returned '123abc'. If the new collation
returns '123Abc' now, then that's quite a change. These are not some
obscure Unicode special case characters, after all.

What is the ICU configuration incantation for this? Maybe we could have
the builtin provider understand some of that, too.

Or we should create a function separate from initcap.

In response to

Re: Built-in CTYPE provider at 2024-03-21 00:13:26 from Jeff Davis

Responses

Re: Built-in CTYPE provider at 2024-03-22 17:26:10 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Nathan Bossart	2024-03-22 14:55:32	Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs
Previous Message	Nathan Bossart	2024-03-22 14:47:39	Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs