From: | Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> |
---|---|
To: | Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ICU integration |
Date: | 2017-03-15 16:33:23 |
Message-ID: | 5291804b-169e-3ba9-fdaf-fae8e7d2d959@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Updated patch attached.
On 3/14/17 22:15, Andreas Karlsson wrote:
> I do not like the schema based solution since search_path already gives
> us enough headaches. As for the naming I am fine with the current scheme.
Yeah, it seems we're going to settle on the suffix idea. See below for
an idea that builds on that.
> - I get a test failures in the default test suite due to not having the
> tr_TR locale installed. I would assume that this would be pretty common
> for hackers.
I have removed that test. It seems it's not possible to test that
portably without major contortions.
> - The code no longer compiles without HAVE_LOCALE_T.
Fixed that.
> - I do not like how it is not obvious which is the default version of
> every locale. E.g. I believe that "sv%icu" is "sv_reformed%icu" and not
> "sv_standard%icu" as one might expect. Is this inherent to ICU or
> something we can work around?
We get these keywords from ucol_getKeywordValuesForLocale(), which says
"Given a key and a locale, returns an array of string values in a
preferred order that would make a difference." So all those are
supposedly different from each other.
> - ICU talks about a new syntax for locale extensions (old:
> de_DE(at)collation=phonebook, new: de_DE_u_co_phonebk) at this page
> http://userguide.icu-project.org/collation/api. Is this something we
> need to car about? It looks like we currently use the old format, and
> while I personally prefer it I am not sure we should rely on an old syntax.
Interesting. I hadn't see this before, and the documentation is sparse.
But it seems that this refers to BCP 47 language tags, which seem like
a good idea.
So what I have done is change all the predefined ICU collations to be
named after the BCP 47 scheme, with a "private use" -x-icu suffix
(instead of %icu). The preserves the original idea but uses a standard
naming scheme.
I'm not terribly worried that they are going to remove the "old" locale
naming, but just to be forward looking, I have changed it so that the
collcollate entries are made using the "new" naming for ICU >=54.
> - I get an error when creating a new locale.
>
> #CREATE COLLATION sv2 (LOCALE = 'sv');
> ERROR: could not create locale "sv": Success
>
> # CREATE COLLATION sv2 (LOCALE = 'sv');
> ERROR: could not create locale "sv": Resource temporarily unavailable
> Time: 1.109 ms
Hmm, that's pretty straightforward code. What is your operating system?
What are the build options? Does it work without this patch?
> - For the collprovider is it really correct to say that 'd' is the
> default value as it does in catalogs.sgml?
It doesn't say it's the default value, it says it uses the database
default. This is all a bit confusing. We have a collation object named
"default", which uses the locale set for the database. That's been that
way for a while. Now when introducing the collation providers, that
"default" collation gets its own collprovider category 'd'. That is not
really used anywhere, but we have to put something there.
> - I do not know what the policy for formatting the documentation is, but
> some of the paragraphs are in need of re-wrapping.
Hmm, I don't see anything terribly bad.
> - Add a hint to "ERROR: conflicting or redundant options". The error
> message is pretty unclear.
I don't see that in my patch. Example?
> - I am not a fan of this patch comparing the encoding with a -1 literal.
> How about adding -1 as a value to the enum? See the example below for a
> place where the patch compares with -1.
That's existing practice. Not a great practice, probably, but widespread.
> - The patch adds "FIXME" in the below. Is that a left-over from
> development or something which still needs doing?
>
> /*
> * Also forbid matching an any-encoding entry. This test of course
> is not
> * backed up by the unique index, but it's not a problem since we don't
> - * support adding any-encoding entries after initdb.
> + * support adding any-encoding entries after initdb. FIXME
> */
I had mentioned that upthread. It technically needs "doing" as you say,
but it's not clear how and it's not terribly important, arguably.
> - Should functions like normalize_locale_name() be renamed to indicate
> they relate to libc locales? I am leaning towards doing so but have not
> looked closely at the task.
Renamed.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
v6-0001-ICU-support.patch | application/x-patch | 156.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Mengxing Liu | 2017-03-15 16:35:34 | Re: Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions |
Previous Message | Emre Hasegeli | 2017-03-15 16:32:58 | Re: Parallel Bitmap scans a bit broken |