From: | Andreas Karlsson <andreas(at)proxel(dot)se> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it? |
Date: | 2017-09-21 09:49:36 |
Message-ID: | be9f0a2c-98dc-3915-6e1b-85a1cf1c0d8a@proxel.se |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 09/21/2017 01:40 AM, Peter Geoghegan wrote:
> On Wed, Sep 20, 2017 at 4:08 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>>> pg_import_system_collations() takes care to use the non-BCP-47 style for
>>> such versions, so I think this is working correctly.
>>
>> But CREATE COLLATION doesn't use pg_import_system_collations().
>
> And perhaps more to the point: it highly confusing that we use one or
> the other of those 2 things ("langtag"/BCP 47 tag or "name"/legacy
> locale name) as "colcollate", depending on ICU version, thereby
> *behaving* as if ICU < 54 really didn't know anything about BCP 47
> tags. Because, obviously earlier ICU versions know plenty about BCP
> 47, since 9 lines further down we use "langtag"/BCP 47 tag as collname
> for CollationCreate() -- regardless of ICU version.
>
> How can you say "ICU <54 doesn't even support the BCP 47 style", given
> all that? Those versions will still have locales named "*-x-icu" when
> users do "\dOS". Users will be highly confused when they quite
> reasonably try to generalize from the example in the docs and what
> "\dOS" shows, and get results that are wrong, often only in a very
> subtle way.
If we are fine with supporting only ICU 4.2 and later (which I think we
are given that ICU 4.2 was released in 2009) then using
uloc_forLanguageTag()[1] to validate and canonize seems like the right
solution. I had missed that this function even existed when I last read
the documentation. Does it return a BCP 47 tag in modern versions of ICU?
I strongly prefer if there, as much as possible, is only one format for
inputting ICU locales.
1.
http://www.icu-project.org/apiref/icu4c/uloc_8h.html#aa45d6457f72867880f079e27a63c6fcb
Andreas
From | Date | Subject | |
---|---|---|---|
Next Message | Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?= | 2017-09-21 09:53:17 | Re: coverage analysis improvements |
Previous Message | Julien Rouhaud | 2017-09-21 09:13:23 | Re: [Proposal] Make the optimiser aware of partitions ordering |