Re: Allow tailoring of ICU locales with custom rules

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Laurenz Albe" <laurenz(dot)albe(at)cybertec(dot)at>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Pgsql-Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow tailoring of ICU locales with custom rules
Date: 2023-02-04 13:41:18
Message-ID: 24dcba27-ef24-4447-9ba8-763c381fcab4@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Laurenz Albe wrote:

> Cool so far. Now I created a database with that locale:
>
> CREATE DATABASE teutsch LOCALE_PROVIDER icu ICU_LOCALE german_phone
> LOCALE "de_AT.utf8" TEMPLATE template0;
>
> Now the rules are not in "pg_database":

The parameter after ICU_LOCALE is passed directly to ICU as a locale
ID, as opposed to refering a collation name in the current database.
This CREATE DATABASE doesn't fail because ICU accepts pretty much
anything as a locale ID, ignoring what it can't parse instead of
erroring out.

I think the way to express what you want should be:

CREATE DATABASE teutsch
LOCALE_PROVIDER icu
ICU_LOCALE 'de_AT'
LOCALE 'de_AT.utf8'
ICU_RULES '&a < g';

However it still leaves "daticurules" empty in the destination db,
because of an actual bug in the current patch.

Looking at createdb() in commands.c, it creates this variable:

@@ -711,6 +714,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
char *dbcollate = NULL;
char *dbctype = NULL;
char *dbiculocale = NULL;
+ char *dbicurules = NULL;
char dblocprovider = '\0';
char *canonname;
int encoding = -1;

and then reads it later

@@ -1007,6 +1017,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
dblocprovider = src_locprovider;
if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
dbiculocale = src_iculocale;
+ if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
+ dbicurules = src_icurules;

/* Some encodings are client only */
if (!PG_VALID_BE_ENCODING(encoding))

but it forgets to assign it in between, so it stays NULL and src_icurules
is taken instead.

> I guess that it is not the fault of this patch that the collation
> isn't there, but I think it is surprising. What good is a database
> collation that does not exist in the database?

Even if the above invocation of CREATE DATABASE worked as you
intuitively expected, by getting the characteristics from the
user-defined collation for the destination db, it still wouldn't work to
refer
to COLLATE "german_phone" in the destination database.
That's because there would be no "german_phone" entry in the
pg_collation of the destination db, as it's cloned from the template
db, which has no reason to have this collation.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2023-02-04 14:20:26 Re: run pgindent on a regular basis / scripted manner
Previous Message Andres Freund 2023-02-04 13:31:23 Re: [PATCH] Compression dictionaries for JSONB