From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Marc-Olaf Jaschke <marc-olaf(dot)jaschke(at)s24(dot)com>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5) |
Date: | 2016-03-23 01:15:09 |
Message-ID: | 20160323011509.GJ3127@tamriel.snowman.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > I was a little worried that it was too much to hope for that all libc
> > vendors on earth would ship a strxfrm() implementation that was actually
> > consistent with strcoll(), and here we are.
>
> Indeed. To try to put some scope on the problem, I made an idiot little
> program that just generates some random UTF8 strings and sees whether
> strcoll and strxfrm sort them alike. Attached are that program, a even
> more idiot little shell script that runs it over all available UTF8
> locales, and the results on my RHEL6 box. While de_DE seems to be the
> worst-broken locale, it's far from the only one.
>
> Please try this on as many platforms as you can get hold of ...
Results for Ubuntu 15.10:
Using LC_COLLATE = "C.UTF-8"
Using LC_CTYPE = "en_US.UTF-8"
C.UTF-8 good
Using LC_COLLATE = "de_DE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
de_DE.utf8 good
Using LC_COLLATE = "en_AG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_AG.utf8 good
Using LC_COLLATE = "en_AU.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_AU.utf8 good
Using LC_COLLATE = "en_BW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_BW.utf8 good
Using LC_COLLATE = "en_CA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_CA.utf8 good
Using LC_COLLATE = "en_DK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_DK.utf8 good
Using LC_COLLATE = "en_GB.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_GB.utf8 good
Using LC_COLLATE = "en_HK.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_HK.utf8 good
Using LC_COLLATE = "en_IE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_IE.utf8 good
Using LC_COLLATE = "en_IN.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_IN.utf8 good
Using LC_COLLATE = "en_NG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_NG.utf8 good
Using LC_COLLATE = "en_NZ.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_NZ.utf8 good
Using LC_COLLATE = "en_PH.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_PH.utf8 good
Using LC_COLLATE = "en_SG.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_SG.utf8 good
Using LC_COLLATE = "en_US.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_US.utf8 good
Using LC_COLLATE = "en_ZA.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_ZA.utf8 good
Using LC_COLLATE = "en_ZM.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_ZM.utf8 good
Using LC_COLLATE = "en_ZW.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_ZW.utf8 good
Will try on others.
Thanks!
Stephen
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2016-03-23 01:18:37 | Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5) |
Previous Message | Tom Lane | 2016-03-23 01:02:15 | Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5) |
From | Date | Subject | |
---|---|---|---|
Next Message | Fabrízio de Royes Mello | 2016-03-23 01:15:43 | Re: NOT EXIST for PREPARE |
Previous Message | Tom Lane | 2016-03-23 01:02:15 | Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5) |