Quick Links

Re: Does UCS_BASIC have the right CTYPE?

From:	"Daniel Verite" <daniel(at)manitou-mail(dot)org>
To:	"Peter Eisentraut" <peter(at)eisentraut(dot)org>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Vik Fearing <vik(at)2ndquadrant(dot)fr>
Subject:	Re: Does UCS_BASIC have the right CTYPE?
Date:	2023-10-26 21:22:24
Message-ID:	0a443ae9-a206-4e07-88b8-329f6aa74c46@manitou-mail.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Peter Eisentraut wrote:

> > That seems to suggest the standard answer should be 'Á' regardless of
> > any COLLATE clause (though I could be misreading). I'm a bit confused
> > by that... what's the standard-compatible way to specify the locale for
> > UPPER()/LOWER()? If there is none, then it makes sense that Postgres
> > overloads the COLLATE clause for that purpose so that users can use a
> > different locale if they want.
>
> The standard doesn't have the notion of locale-dependent case conversion.

Neither does Unicode, which is why the ICU functions like u_isupper()
or u_toupper() don't take a locale argument.

With libc, isupper_l() and the other ctype functions need a locale
argument, but given a locale's value of
"language[_territory][.codeset]", in theory only the codeset part is
actually useful.

To me the question of what we should put in pg_collation.collctype
for the "ucs_basic" collation leads to another question which is:
why do we even consider collctype in the first place?

Within a database, there's only one "codeset", which corresponds
to pg_database.encoding, and there's a value in pg_database.lc_ctype
that is normally compatible with that encoding.
ISTM that UPPER(string COLLATE "whatever") should always give
the same result than UPPER(string COLLATE pg_catalog.default). And
likewise all functions that depend on character categories could
basically ignore the COLLATE specification, given that our
database-wide properties are sufficient to characterize the strings
within.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

In response to

Re: Does UCS_BASIC have the right CTYPE? at 2023-10-26 14:49:55 from Peter Eisentraut

Responses

Re: Does UCS_BASIC have the right CTYPE? at 2023-10-26 21:32:14 from Tom Lane
Re: Does UCS_BASIC have the right CTYPE? at 2023-10-26 22:48:26 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David G. Johnston	2023-10-26 21:27:52	Re: Add recovery to pg_control and remove backup_label
Previous Message	Alena Rybakina	2023-10-26 21:16:28	Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements