From: | "Daniel Verite" <daniel(at)manitou-mail(dot)org> |
---|---|
To: | "Peter Eisentraut" <peter(dot)eisentraut(at)2ndquadrant(dot)com> |
Cc: | "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ICU for global collation |
Date: | 2019-09-17 13:08:36 |
Message-ID: | 5d807706-60a2-4e56-bc59-eef9e7deb138@manitou-mail.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
When trying databases defined with ICU locales, I see that backends
that serve such databases seem to have their LC_CTYPE inherited from
the environment (as opposed to a per-database fixed value).
That's a problem for the backend code that depends on libc functions
that themselves depend on LC_CTYPE, such as the full text search parser
and dictionaries.
For instance, if you start the instance with a C locale
(LC_ALL=C pg_ctl...) , and tries to use FTS in an ICU UTF-8 database,
it doesn't work:
template1=# create database "fr-utf8"
template 'template0' encoding UTF8
locale 'fr'
collation_provider 'icu';
template1=# \c fr-utf8
You are now connected to database "fr-utf8" as user "daniel".
fr-utf8=# show lc_ctype;
lc_ctype
----------
fr
(1 row)
fr-utf8=# select to_tsvector('été');
ERROR: invalid multibyte character for locale
HINT: The server's LC_CTYPE locale is probably incompatible with the
database encoding.
If I peek into the "real" LC_CTYPE when connected to this database,
I can see it's "C":
fr-utf8=# create extension plperl;
CREATE EXTENSION
fr-utf8=# create function lc_ctype() returns text as '$ENV{LC_CTYPE};'
language plperl;
CREATE FUNCTION
fr-utf8=# select lc_ctype();
lc_ctype
----------
C
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
From | Date | Subject | |
---|---|---|---|
Next Message | Mahendra Singh | 2019-09-17 13:15:06 | Re: range test for hash index? |
Previous Message | Fabien COELHO | 2019-09-17 13:07:53 | Re: pgbench - allow to create partitioned tables |