Re: Upgrading locale issues

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "rihad" <rihad(at)mail(dot)ru>
Cc: "Peter Geoghegan" <pg(at)bowt(dot)ie>,"pgsql-general General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Upgrading locale issues
Date: 2019-05-02 13:36:29
Message-ID: f113b2da-eef3-4c55-b5f9-e60032603ee3@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

rihad wrote:

> Thanks for the reply. Do you know what would a "decent" ICU collation be
> to bind to a field's schema definition so it would mimic a UTF-8
> encoding for a multilingual column? Maybe und-x-icu? We aren't as much
> concerned about their sortability in most cases, we just want indexes to
> better handle future PG/ICU upgrades. But what does und(efined) even
> mean with respect to collations?

"undefined" in this context means unspecified language and
unspecified country or region. It implies that no language-specific
nor regional rule will be applied to compare strings.

Using C.UTF-8 as the collation for text fields to index may be the
best trade-off in your case. It should be immune to libc and ICU
upgrades.

With C.UTF-8, a string like 'BC' will sort before 'ab', and punctuation
and accents will also sort differently than with a linguistic-aware
collation.
If your applications care about that, it can be fixed by simply
adding COLLATE "default" to the ORDER BY clause of the queries that
are meant to present data to users.
COLLATE "default" means the collation of the database, which
presumably would be something like "language_REGION.UTF-8" in your
case. If you never specified it explicitly, it came from initdb which
itself got it from the environment of the server.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2019-05-02 14:23:21 Re: Back Slash \ issue
Previous Message Guntry Vinod 2019-05-02 12:20:10 Back Slash \ issue