From: | 周正中(德歌) <dege(dot)zzz(at)alibaba-inc(dot)com> |
---|---|
To: | "Haotian Yang" <yangnw(at)live(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
Subject: | 回复:`pg_trgm` not recognizing Chinese characters in macOS |
Date: | 2018-09-12 05:02:48 |
Message-ID: | 31ad828c-7926-41d7-b54e-6d3c79cc2a03.dege.zzz@alibaba-inc.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
you should use lc_ctype not to C.
```
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+------------+------------+-----------------------
newdb | postgres | UTF8 | en_US.UTF8 | en_US.UTF8 |
postgres | postgres | UTF8 | en_US.UTF8 | en_US.UTF8 |
template0 | postgres | UTF8 | en_US.UTF8 | en_US.UTF8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF8 | en_US.UTF8 | =c/postgres +
| | | | | postgres=CTc/postgres
(4 rows)
postgres=# select show_trgm('hello你好');
show_trgm
------------------------------------------------------
{0xcf7970,0xfe5170,0x114ebf," h"," he",ell,hel,llo}
(1 row)
postgres=# create database testdb with template template0 lc_ctype='C';
CREATE DATABASE
postgres=# \c testdb
You are now connected to database "testdb" as user "postgres".
testdb=# create extension pg_trgm;
CREATE EXTENSION
testdb=# select show_trgm('hello你好');
show_trgm
---------------------------------
{" h"," he",ell,hel,llo,"lo "}
(1 row)
```
------------------------------------------------------------------
发件人:Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
发送时间:2018年9月11日(星期二) 21:20
收件人:Haotian Yang <yangnw(at)live(dot)com>
抄 送:pgsql-bugs(at)postgresql(dot)org <pgsql-bugs(at)postgresql(dot)org>
主 题:Re: `pg_trgm` not recognizing Chinese characters in macOS
Haotian Yang <yangnw(at)live(dot)com> writes:
> Versions: macOS 10.13.6, PostgreSQL 10.5, pg_trgm 1.3.
> LC_ALL=en_US.UTF-8
pg_trgm relies on libc's functions (specifically, iswalpha()) to determine
what is a word character or not. Unfortunately, the UTF8 locale support
in macOS is pretty incomplete, and I don't find it too surprising that
it's not recognizing Chinese characters as alphabetic. Now, you could
make a good argument that they *shouldn't* be considered alphabetic in
an en_US locale; but I'm unsure whether switching to a more appropriate
locale will help.
Anyway, I'd first try zh_CN.UTF-8, and if that doesn't fix it, the place
to complain is https://bugreport.apple.com/ ... I'm sure they know about
it already, but the number of reports has an impact on how fast they
fix things.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Mareks Kalnačs | 2018-09-12 08:31:50 | PostgreSQL 10.0 SELECT LIMIT performance problem |
Previous Message | Tom Lane | 2018-09-12 03:30:40 | Re: BUG #15380: Sorting paging data loss |