Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?
Date: 2017-09-19 22:47:04
Message-ID: CAH2-Wzn50DHKmOsTYitA0j4NvZOpG5A4WdcXxDRRTY3nyChz-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 19, 2017 at 3:23 PM, Andreas Karlsson <andreas(at)proxel(dot)se> wrote:
> If people think it is possible to get this done in time for PostgreSQL 10
> and it does not break anything on older version of ICU (or the migration
> from older versions) I am all for it.

The only behavioral difference would occur when CREATE COLLATION is
run (no changes to the initdb behavior, where the real risk exists).

What this boils down to is that we have every reason to think that the
right thing is to reject something that ICU's uloc_toLanguageTag()
itself rejects as invalid (through the get_icu_language_tag()
function). It looked like we were equivocating on following this at
one point, prior to 2bfd1b1, in order to suit ICU 4.2 (again, see
commit eccead9). I tend to think that the way we used to concatenate
variant keywords against locale names during initdb was simply wrong
in ICU 4.2, for some unknown reason. I think that the behavior that I
propose might prevent things from silently failing on ICU 4.2 that
were previously *believed* to work there, but in fact these things
(variant keywords) never really worked.

There might be an argument to be made for passing strict = FALSE to
uloc_toLanguageTag() instead, so that we replace the language tag with
one that is known to have valid syntax, and store that in pg_collation
instead (while possibly raising a NOTICE). I guess that that would
actually be the minimal fix here. I still favor a hard reject of
invalid BCP 47 syntax, though. PostgreSQL's CREATE COLLATION is not
one of the places where this kind of leniency makes sense.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-09-19 22:55:31 Re: Setting pd_lower in GIN metapage
Previous Message David Steele 2017-09-19 22:36:01 Re: Show backtrace when tap tests fail