From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-committers(at)lists(dot)postgresql(dot)org |
Subject: | pgsql: Make ts_locale.c's character-type functions cope with UTF-16. |
Date: | 2018-11-03 17:56:32 |
Message-ID: | E1gJ09w-00057u-5s@gemulon.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers |
Make ts_locale.c's character-type functions cope with UTF-16.
On Windows, in UTF8 database encoding, what char2wchar() produces is
UTF16 not UTF32, ie, characters above U+FFFF will be represented by
surrogate pairs. t_isdigit() and siblings did not account for this
and failed to provide a large enough result buffer. That in turn
led to bogus "invalid multibyte character for locale" errors, because
contrary to what you might think from char2wchar()'s documentation,
its Windows code path doesn't cope sanely with buffer overflow.
The solution for t_isdigit() and siblings is pretty clear: provide
a 3-wchar_t result buffer not 2.
char2wchar() also needs some work to provide more consistent, and more
accurately documented, buffer overrun behavior. But that's a bigger job
and it doesn't actually have any immediate payoff, so leave it for later.
Per bug #15476 from Kenji Uno, who deserves credit for identifying the
cause of the problem. Back-patch to all active branches.
Discussion: https://postgr.es/m/15476-4314f480acf0f114@postgresql.org
Branch
------
REL9_5_STABLE
Details
-------
https://git.postgresql.org/pg/commitdiff/6e6092989fbb27fecdbfbcd66f39e10f18aa0a69
Modified Files
--------------
src/backend/tsearch/ts_locale.c | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2018-11-03 21:49:12 | pgsql: Fix STRICT check for strict aggregates with NULL ORDER BY column |
Previous Message | Alvaro Herrera | 2018-11-03 16:33:29 | pgsql: Fix tablespace handling for partitioned indexes |