From: | emmanuel(dot)reynard(at)eranea(dot)com |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | BUG #14072: char() and application of locale collation |
Date: | 2016-04-07 10:28:20 |
Message-ID: | 20160407102820.15196.86869@wrigleys.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 14072
Logged by: Emmanuel Reynard
Email address: emmanuel(dot)reynard(at)eranea(dot)com
PostgreSQL version: 9.4.7
Operating system: linux
Description:
Hello,
We've been using a custom locale since we needed to use special ordering for
certain values in a UTF-8 database.
For example, E'\xEC\x82\x80' (E'\uC080') is defined as the first in the
list, before all other possible codepoints/characters.
While it works fine for varchars, properly placing this codepoint before a
space, it doesnt work as we expected in the case of chars.
This seems to be explained by the fact that bcharcmp() (in
src/backend/utils/adt/varchar.c) uses the bcTrueLength of said char(), which
does not count the number of trailing spaces.
Hence a blank char(3) with value '\x202020' is of "true length" 0 and
smaller than any other string, and any locale information will be
disregarded.
This behaviour already had surprised us while calling the length() function
on char(x) with blank pads :
test_utf8=# select length(' '::char(3)) as char_len, char_length('
'::char(3)) as char_charlen, octet_length(' '::char(3)) as char_octetlen,
length(' '::varchar(3)) as varchar_len;
char_len | char_charlen | char_octetlen | varchar_len
----------+--------------+---------------+-------------
0 | 0 | 3 | 3
(1 row)
Is this considered normal? Especially the case with char(x) which can ignore
partially ignore collation if they have trailing blanks.
I can imagine that changing this length behaviour might break a few things
though, is there a workaround besides creating a new custom type?
Cheers.
Emmanuel Reynard
- test case :
create table test_collation (c1 char(3), c2 varchar(3));
insert into test_collation values (' ', ' ');
insert into test_collation values (E'\xEC\x82\x80', E'\xEC\x82\x80');
insert into test_collation values (' ' || E'\uC080', ' ' || E'\uC080');
select c2, c2::bytea from test_collation order by c2 collate "fr_FR.utf8";
c2 | c2
-----+------------
| \x20
삀 | \x20ec8280
삀 | \xec8280
select c2, c2::bytea from test_collation order by c2 collate
"locale_eranea";
c2 | c2
-----+------------
삀 | \xec8280
| \x20
삀 | \x20ec8280
select c1, c1::bytea from test_collation order by c1 collate "fr_FR.utf8";
c1 | c1
------+--------------
| \x202020
삀 | \x20ec828020
삀 | \xec82802020
select c1, c1::bytea from test_collation order by c1 collate
"locale_eranea";
c1 | c1
------+--------------
| \x202020
삀 | \xec82802020
삀 | \x20ec828020
From | Date | Subject | |
---|---|---|---|
Next Message | Tim Singletary | 2016-04-07 11:15:47 | Re: BUG #14054: "create index using gist ..." on large table never completes. |
Previous Message | Miklós Fazekas | 2016-04-07 08:26:15 | released savepoint blocking further statements |