From: | Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | ICU for global collation |
Date: | 2019-08-20 14:21:21 |
Message-ID: | 5e756dd6-0e91-d778-96fd-b1bcb06c161a@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Here is an initial patch to add the option to use ICU as the global
collation provider, a long-requested feature.
To activate, use something like
initdb --collation-provider=icu --locale=...
A trick here is that since we need to also still set the normal POSIX
locales, the --locale value needs to be valid as both a POSIX locale and
a ICU locale. If that doesn't work out, there is also a way to specify
it separately, e.g.,
initdb --collation-provider=icu --locale=en_US.utf8 --icu-locale=en
This complexity is unfortunate, but I don't see a way around it right now.
There are also options for createdb and CREATE DATABASE to do this for a
particular database only.
Besides this, the implementation is quite small: When starting up a
database, we create an ICU collator object, store it in a global
variable, and then use it when appropriate. All the ICU code for
creating and invoking those collators already exists of course.
For the version tracking, I use the pg_collation row for the "default"
collation. Again, this mostly reuses existing code and concepts.
Nondeterministic collations are not supported for the global collation,
because then LIKE and regular expressions don't work and that breaks
some system views. This needs some separate research.
To test, run the existing regression tests against a database
initialized with ICU. Perhaps some options for pg_regress could
facilitate that.
I fear that the Localization chapter in the documentation will need a
bit of a rewrite after this, because the hitherto separately treated
concepts of locale and collation are fusing together. I haven't done
that here yet, but that would be the plan for later.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Add-option-to-use-ICU-as-global-collation-provide.patch | text/plain | 43.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2019-08-20 14:27:44 | Re: POC: Cleaning up orphaned files using undo logs |
Previous Message | Tom Lane | 2019-08-20 14:07:02 | Re: configure still looking for crypt()? |