From: | PG Bug reporting form <noreply(at)postgresql(dot)org> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Cc: | h8mastre(at)gmail(dot)com |
Subject: | BUG #15476: Problem on show_trgm with 4 byte UTF-8 characters |
Date: | 2018-11-01 02:39:20 |
Message-ID: | 15476-4314f480acf0f114@postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 15476
Logged by: Kenji Uno
Email address: h8mastre(at)gmail(dot)com
PostgreSQL version: 9.6.2
Operating system: Windows Server 2012 Japanese
Description:
# Problem on show_trgm with 4 byte UTF-8 characters
On Encoding=UTF-8 database, try:
SELECT show_trgm('123');
→ OK
SELECT show_trgm('日本語');
→ probably OK.
SELECT show_trgm('🔍');
→ ERROR!
ERROR: invalid multibyte character for locale
HINT: The server's LC_CTYPE locale is probably incompatible with the
database encoding.
SQL state: 22021
I have reviewed some of your source code. And I have found a suspect
point.
Please check: t_isdigit, t_isspace, t_isalpha, and t_isprint.
https://github.com/postgres/postgres/blob/322548a8abe225f2cfd6a48e07b99e2711d28ef7/src/backend/tsearch/ts_locale.c#L35
char2wchar 4th parameter should take number of input bytes. However they
pass character count.
int clen = pg_mblen(ptr);
...
char2wchar(character, 2, ptr, clen, mylocale);
I'm afraid, but could you look into about this?
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2018-11-01 06:41:48 | BUG #15477: Procedure call with named inout refcursor parameter - "invalid input syntax for type boolean" error |
Previous Message | Paul Schaap | 2018-11-01 01:48:43 | Re: BUG #15475: Views over CITEXT columns return no data |