From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Marc-Olaf Jaschke <marc-olaf(dot)jaschke(at)s24(dot)com>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5) |
Date: | 2016-03-23 01:20:00 |
Message-ID: | 20160323012000.GK3127@tamriel.snowman.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > I was a little worried that it was too much to hope for that all libc
> > vendors on earth would ship a strxfrm() implementation that was actually
> > consistent with strcoll(), and here we are.
>
> Indeed. To try to put some scope on the problem, I made an idiot little
> program that just generates some random UTF8 strings and sees whether
> strcoll and strxfrm sort them alike. Attached are that program, a even
> more idiot little shell script that runs it over all available UTF8
> locales, and the results on my RHEL6 box. While de_DE seems to be the
> worst-broken locale, it's far from the only one.
>
> Please try this on as many platforms as you can get hold of ...
Results for Ubuntu 14.04:
sfrost(at)dwemer:/home/sfrost> sh tryalllocales.sh
Using LC_COLLATE = "C.UTF-8"
Using LC_CTYPE = "en_US.UTF-8"
C.UTF-8 good
Using LC_COLLATE = "de_DE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
inconsistency between strcoll (36) and strxfrm (35) orders
inconsistency between strcoll (35) and strxfrm (36) orders
inconsistency between strcoll (160) and strxfrm (159) orders
inconsistency between strcoll (159) and strxfrm (160) orders
inconsistency between strcoll (347) and strxfrm (346) orders
inconsistency between strcoll (348) and strxfrm (347) orders
inconsistency between strcoll (346) and strxfrm (348) orders
inconsistency between strcoll (355) and strxfrm (353) orders
inconsistency between strcoll (353) and strxfrm (354) orders
inconsistency between strcoll (354) and strxfrm (355) orders
inconsistency between strcoll (440) and strxfrm (439) orders
inconsistency between strcoll (441) and strxfrm (440) orders
inconsistency between strcoll (439) and strxfrm (441) orders
inconsistency between strcoll (450) and strxfrm (449) orders
inconsistency between strcoll (449) and strxfrm (450) orders
inconsistency between strcoll (454) and strxfrm (452) orders
inconsistency between strcoll (455) and strxfrm (453) orders
inconsistency between strcoll (452) and strxfrm (454) orders
inconsistency between strcoll (453) and strxfrm (455) orders
inconsistency between strcoll (521) and strxfrm (520) orders
inconsistency between strcoll (520) and strxfrm (521) orders
inconsistency between strcoll (529) and strxfrm (528) orders
inconsistency between strcoll (528) and strxfrm (529) orders
inconsistency between strcoll (682) and strxfrm (681) orders
inconsistency between strcoll (681) and strxfrm (682) orders
inconsistency between strcoll (743) and strxfrm (742) orders
inconsistency between strcoll (742) and strxfrm (743) orders
inconsistency between strcoll (830) and strxfrm (829) orders
inconsistency between strcoll (829) and strxfrm (830) orders
inconsistency between strcoll (870) and strxfrm (869) orders
inconsistency between strcoll (869) and strxfrm (870) orders
inconsistency between strcoll (933) and strxfrm (931) orders
inconsistency between strcoll (931) and strxfrm (932) orders
inconsistency between strcoll (932) and strxfrm (933) orders
de_DE.utf8 BAD
Using LC_COLLATE = "en_US.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_US.utf8 good
Thanks!
Stephen
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2016-03-23 01:30:11 | Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5) |
Previous Message | Thomas Munro | 2016-03-23 01:18:37 | Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5) |
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2016-03-23 01:25:06 | Re: [PATCH] fix DROP OPERATOR to reset links to itself on commutator and negator |
Previous Message | Thomas Munro | 2016-03-23 01:18:37 | Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5) |