Re: ICU for global collation

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Daniel Verite <daniel(at)manitou-mail(dot)org>, pryzby(at)telsasoft(dot)com
Subject: Re: ICU for global collation
Date: 2022-08-17 16:53:59
Message-ID: 20220817165359.mcco6ufe2vchokqo@jrouhaud
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Mon, Aug 15, 2022 at 03:06:32PM +0300, Marina Polyakova wrote:
>
> 1.1) It looks like there's a bug in the function get_db_infos
> (src/bin/pg_upgrade/info.c), where the version of the old cluster is always
> checked:
>
> if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)
> snprintf(query + strlen(query), sizeof(query) - strlen(query),
> "'c' AS datlocprovider, NULL AS daticulocale, ");
> else
> snprintf(query + strlen(query), sizeof(query) - strlen(query),
> "datlocprovider, daticulocale, ");
>
> With the simple patch
>
> diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
> index df374ce4b362b4c6c87fc1fd0e476e5d6d353d9e..53ea348e211d3ac38334292bc16cb814bc13bb87
> 100644
> --- a/src/bin/pg_upgrade/info.c
> +++ b/src/bin/pg_upgrade/info.c
> @@ -319,7 +319,7 @@ get_db_infos(ClusterInfo *cluster)
>
> snprintf(query, sizeof(query),
> "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, ");
> - if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)
> + if (GET_MAJOR_VERSION(cluster->major_version) < 1500)
> snprintf(query + strlen(query), sizeof(query) - strlen(query),
> "'c' AS datlocprovider, NULL AS daticulocale, ");
> else
>
> I got the expected error during the upgrade:
>
> locale providers for database "template1" do not match: old "libc", new
> "icu"
> Failure, exiting

Good catch. There's unfortunately not a lot of regression tests for
ICU-initialized clusters. I'm wondering if the build-farm client could be
taught about the locale provider rather than assuming libc. Clearly that
wouldn't have caught this issue, but it should still increase the coverage a
bit (I'm thinking of the recent problem with the abbreviated keys).

> 1.2) It looks like the mentioned asserion in dbcommands.c conflicts with the
> following lines earlier:
>
> if (dbiculocale == NULL)
> dbiculocale = src_iculocale;
>
> The following patch works for me:
>
> diff --git a/src/backend/commands/dbcommands.c
> b/src/backend/commands/dbcommands.c
> index b31a30550b025d48ba3cc250dc4c15f41f9a80be..17a2942341e528c01182fb6d4878580f2706bec9
> 100644
> --- a/src/backend/commands/dbcommands.c
> +++ b/src/backend/commands/dbcommands.c
> @@ -1048,6 +1048,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> errmsg("ICU locale must be specified")));
> }
> + else
> + dbiculocale = NULL;
>
> if (dblocprovider == COLLPROVIDER_ICU)
> check_icu_locale(dbiculocale);

I think it would be better to do that in the various variable initialization.
Maybe switch the dblocprovider and dbiculocale initialization, and only
initialize dbiculocale if dblocprovider is COLLPROVIDER_ICU.

> 2) CREATE DATABASE does not always require the icu locale unlike initdb and
> createdb:
>
> $ createdb mydb --locale en_US.UTF-8 --template template0 --locale-provider
> icu
> createdb: error: database creation failed: ERROR: ICU locale must be
> specified
>
> $ psql -c "CREATE DATABASE mydb LOCALE \"en_US.UTF-8\" TEMPLATE template0
> LOCALE_PROVIDER icu" postgres
> CREATE DATABASE
>
> I'm wondering if this is not a fully-supported feature (because createdb
> creates an SQL command with LC_COLLATE and LC_CTYPE options instead of
> LOCALE option) or is it a bug in CREATE DATABASE?.. From
> src/backend/commands/dbcommands.c:
>
> if (dblocprovider == COLLPROVIDER_ICU && !dbiculocale)
> {
> if (dlocale && dlocale->arg)
> dbiculocale = defGetString(dlocale);
> }

This discrepancy between createdb and CREATE DATABASE looks like an oversight,
as createdb indeed interprets --locale as:

if (locale)
{
if (lc_ctype)
pg_fatal("only one of --locale and --lc-ctype can be specified");
if (lc_collate)
pg_fatal("only one of --locale and --lc-collate can be specified");
lc_ctype = locale;
lc_collate = locale;
}

AFAIK the fallback in the CREATE DATABASE case is expected as POSIX locale
names should be accepted by icu, so this should work for createdb too.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-08-17 17:44:23 Re: pg_walinspect: ReadNextXLogRecord's first_record argument
Previous Message Bharath Rupireddy 2022-08-17 16:28:56 Re: pg_walinspect: ReadNextXLogRecord's first_record argument