From: | Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it? |
Date: | 2017-10-01 13:56:11 |
Message-ID: | 4ac3c6a8-8495-78c4-0353-504990e5da8a@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 9/30/17 03:01, Noah Misch wrote:
> On Mon, Sep 25, 2017 at 08:26:21AM +0000, Noah Misch wrote:
>> On Tue, Sep 19, 2017 at 07:01:47PM -0700, Peter Geoghegan wrote:
>>> On Tue, Sep 19, 2017 at 5:52 PM, Peter Eisentraut
>>> <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
>>>> On 9/18/17 18:46, Peter Geoghegan wrote:
>>>>> As I pointed out a couple of times already [1], we don't currently
>>>>> sanitize ICU's BCP 47 language tags within CREATE COLLATION.
>>>>
>>>> There is no requirement that the locale strings for ICU need to be BCP
>>>> 47. ICU locale names like 'de(at)collation=phonebook' are also acceptable.
>>>
>>> Right. But, we only document that BCP 47 is supported by Postgres.
>>> Maybe we can use get_icu_language_tag()/uloc_toLanguageTag() to ensure
>>> that we end up with BCP 47, even when the user happens to specify the
>>> legacy syntax. Should we be "canonicalizing" legacy ICU locale strings
>>> as BCP 47, too?
>>>
>>>> The reason they are not validated is that, as you know, ICU accepts any
>>>> locale string as valid. You appear to have found a way to do some
>>>> validation, but I would like to see that code.
>>>
>>> As I mentioned, I'm simply calling
>>> get_icu_language_tag()/uloc_toLanguageTag() to do that sanitization.
>>> The code to get the extra sanitization is completely trivial.
>>>
>>> I didn't post the patch that generates the errors in my opening e-mail
>>> because I'm not confident it's correct just yet. And, I think that I
>>> see a bigger problem: we pass a string that is almost certainly a BCP
>>> 47 string to ucol_open() from within pg_newlocale_from_collation(). We
>>> do so despite the fact that ucol_open() apparently doesn't accept BCP
>>> 47 syntax locale strings until ICU 54 [1]. Seems entirely possible
>>> that this accounts for the problems you saw on ICU 4.2, back when we
>>> were still creating keyword variants (I guess that the keyword
>>> variants seem very "BCP 47-ish" to me).
>>>
>>> I really think we need to add some kind of debug mode that makes ICU
>>> optionally spit out a locale display name at key points. We need this
>>> to gain confidence that the behavior that ICU provides actually
>>> matches what is expected across ICU different versions for different
>>> locale formats. We talked about this as a user-facing feature before,
>>> which can wait till v11; I just want this to debug problems like this
>>> one.
>>>
>>> [1] https://ssl.icu-project.org/apiref/icu4c/ucol_8h.html#a4721e4c0a519bb0139a874e191223590
>>
>> [Action required within three days. This is a generic notification.]
>>
>> The above-described topic is currently a PostgreSQL 10 open item. Peter,
>> since you committed the patch believed to have created it, you own this open
>> item. If some other commit is more relevant or if this does not belong as a
>> v10 open item, please let us know. Otherwise, please observe the policy on
>> open item ownership[1] and send a status update within three calendar days of
>> this message. Include a date for your subsequent status update. Testers may
>> discover new open items at any time, and I want to plan to get them all fixed
>> well in advance of shipping v10. Consequently, I will appreciate your efforts
>> toward speedy resolution. Thanks.
>>
>> [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>
> This PostgreSQL 10 open item is past due for your status update. On the worst
> week to be violating open item policies. Kindly send a status update within
> 24 hours, and include a date for your subsequent status update. Refer to the
> policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
I had previously replied that I think nothing should be changed. What
process should be applied in that case?
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Дмитрий Воронин | 2017-10-01 14:33:24 | Warnings in objectaddress.c |
Previous Message | Amit Kapila | 2017-10-01 13:55:13 | Re: Parallel Append implementation |