Quick Links

Re: Upgrading locale issues

From:	rihad <rihad(at)mail(dot)ru>
To:	Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc:	Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Upgrading locale issues
Date:	2019-05-02 17:45:25
Message-ID:	352ac37a-39e6-5afe-d822-b4e031b9b314@mail.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 05/02/2019 05:36 PM, Daniel Verite wrote:
> rihad wrote:
>
>> Thanks for the reply. Do you know what would a "decent" ICU collation be
>> to bind to a field's schema definition so it would mimic a UTF-8
>> encoding for a multilingual column? Maybe und-x-icu? We aren't as much
>> concerned about their sortability in most cases, we just want indexes to
>> better handle future PG/ICU upgrades. But what does und(efined) even
>> mean with respect to collations?
> "undefined" in this context means unspecified language and
> unspecified country or region. It implies that no language-specific
> nor regional rule will be applied to compare strings.
>
> Using C.UTF-8 as the collation for text fields to index may be the
> best trade-off in your case. It should be immune to libc and ICU
> upgrades.
>
> With C.UTF-8, a string like 'BC' will sort before 'ab', and punctuation
> and accents will also sort differently than with a linguistic-aware
> collation.
Thanks, I'm a bit confused here. AFAIK indexes are used for at least two
things: for speed and for skipping the ORDER BY step (since btree
indexes are already sorted). Will such an "upgrade-immune" C.UTF-8 index
still work correctly for table lookups? And can the existing
en_US.UTF-8 fields' definition be altered in place, without a
dump+restore? en_US.UTF-8 is the default encoding+locale+collation, it
isn't set explicitly for any of our string columns. I assume there's
some "catch-all" ordering taking place even for the C locale, so there
won't be any bizarre things like b coming before a, or generally for any
language, the second letter of its alphabet coming before the first?

> If your applications care about that, it can be fixed by simply
> adding COLLATE "default" to the ORDER BY clause of the queries that
> are meant to present data to users.
> COLLATE "default" means the collation of the database, which
> presumably would be something like "language_REGION.UTF-8" in your
> case. If you never specified it explicitly, it came from initdb which
> itself got it from the environment of the server.
>
>
> Best regards,

In response to

Re: Upgrading locale issues at 2019-05-02 13:36:29 from Daniel Verite

Responses

Re: Upgrading locale issues at 2019-05-03 13:35:47 from Daniel Verite

Browse pgsql-general by date

	From	Date	Subject
Next Message	Ray Cote	2019-05-02 18:22:46	Debugging Failed Startup
Previous Message	Sabit Nepal	2019-05-02 17:14:31	Postgres using inefficient query plan when using OR filter, uses correct indices when using IN clause