Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Date: 2017-08-07 20:21:19
Message-ID: CAH2-Wzm=HJ6_TXjftfXv+Nk69xBvRd=Pc8N0BPy+oHzjq-Gw=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Mon, Aug 7, 2017 at 12:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Well, the fact that they're "redundant" doesn't really help you if
> you can't pg_upgrade because the collation name you chose in v10 is
> not present in initdb's results in v11. So this is still a serious
> issue to my mind.

I agree.

Even MongoDB has ICU support these days. They specifically document
which collations are supported. It's just the same for DB2, and other
systems that build their collations on ICU. Users do not "use the ICU
collations" on these other systems. They simply use the collations
that are available, choosing from a list in the documentation, or
possibly create their own collations with their own customization.

The ICU collations are based on the CLDR data and an IETF standard's
idea of a locale identifier [1], so in an important sense they're
supposed to be universal; they're not tied to ICU in particular. This
is probably why ICU is ridiculously forgiving of alternate collation
names, and will not throw an error if you specify an ICU collation
name that is total garbage within CREATE COLLATION (there is a
Postgres regression test that proves this for ICU, actually): As far
as ICU is concerned, this may be coming from input from an end user
over the web, where it makes sense to be so forgiving.

Even stuff like the names for emoji collations, or phonebook
collations, are covered by a standard, though it's not quite an IETF
standard. RFC 6067 says that the CLDR data is the authoritative source
of which variant subtags are allowed, and ICU uses CLDR, from the
Unicode consortium.

We need to move further away from the idea that there are ICU
collations just like there are libc collations.

[1] https://www.rfc-editor.org/rfc/bcp/bcp47.txt
--
Peter Geoghegan

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2017-08-07 20:34:32 Re: BUG #14771: "Logical decoding" does not cover the impact of "TRUNCATE TABLE" command
Previous Message Michael Paquier 2017-08-07 20:19:45 Re: BUG #14771: "Logical decoding" does not cover the impact of "TRUNCATE TABLE" command

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2017-08-07 20:32:22 Re: pgsql: Record full paths of programs sought by "configure".
Previous Message Tom Lane 2017-08-07 20:20:46 Re: pgsql: Record full paths of programs sought by "configure".