Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Marc-Olaf Jaschke <marc-olaf(dot)jaschke(at)s24(dot)com>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Date: 2016-03-23 01:20:00
Message-ID: 20160323012000.GK3127@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > I was a little worried that it was too much to hope for that all libc
> > vendors on earth would ship a strxfrm() implementation that was actually
> > consistent with strcoll(), and here we are.
>
> Indeed. To try to put some scope on the problem, I made an idiot little
> program that just generates some random UTF8 strings and sees whether
> strcoll and strxfrm sort them alike. Attached are that program, a even
> more idiot little shell script that runs it over all available UTF8
> locales, and the results on my RHEL6 box. While de_DE seems to be the
> worst-broken locale, it's far from the only one.
>
> Please try this on as many platforms as you can get hold of ...

Results for Ubuntu 14.04:

sfrost(at)dwemer:/home/sfrost> sh tryalllocales.sh
Using LC_COLLATE = "C.UTF-8"
Using LC_CTYPE = "en_US.UTF-8"
C.UTF-8 good
Using LC_COLLATE = "de_DE.utf8"
Using LC_CTYPE = "en_US.UTF-8"
inconsistency between strcoll (36) and strxfrm (35) orders
inconsistency between strcoll (35) and strxfrm (36) orders
inconsistency between strcoll (160) and strxfrm (159) orders
inconsistency between strcoll (159) and strxfrm (160) orders
inconsistency between strcoll (347) and strxfrm (346) orders
inconsistency between strcoll (348) and strxfrm (347) orders
inconsistency between strcoll (346) and strxfrm (348) orders
inconsistency between strcoll (355) and strxfrm (353) orders
inconsistency between strcoll (353) and strxfrm (354) orders
inconsistency between strcoll (354) and strxfrm (355) orders
inconsistency between strcoll (440) and strxfrm (439) orders
inconsistency between strcoll (441) and strxfrm (440) orders
inconsistency between strcoll (439) and strxfrm (441) orders
inconsistency between strcoll (450) and strxfrm (449) orders
inconsistency between strcoll (449) and strxfrm (450) orders
inconsistency between strcoll (454) and strxfrm (452) orders
inconsistency between strcoll (455) and strxfrm (453) orders
inconsistency between strcoll (452) and strxfrm (454) orders
inconsistency between strcoll (453) and strxfrm (455) orders
inconsistency between strcoll (521) and strxfrm (520) orders
inconsistency between strcoll (520) and strxfrm (521) orders
inconsistency between strcoll (529) and strxfrm (528) orders
inconsistency between strcoll (528) and strxfrm (529) orders
inconsistency between strcoll (682) and strxfrm (681) orders
inconsistency between strcoll (681) and strxfrm (682) orders
inconsistency between strcoll (743) and strxfrm (742) orders
inconsistency between strcoll (742) and strxfrm (743) orders
inconsistency between strcoll (830) and strxfrm (829) orders
inconsistency between strcoll (829) and strxfrm (830) orders
inconsistency between strcoll (870) and strxfrm (869) orders
inconsistency between strcoll (869) and strxfrm (870) orders
inconsistency between strcoll (933) and strxfrm (931) orders
inconsistency between strcoll (931) and strxfrm (932) orders
inconsistency between strcoll (932) and strxfrm (933) orders
de_DE.utf8 BAD
Using LC_COLLATE = "en_US.utf8"
Using LC_CTYPE = "en_US.UTF-8"
en_US.utf8 good

Thanks!

Stephen

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Stephen Frost 2016-03-23 01:30:11 Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Previous Message Thomas Munro 2016-03-23 01:18:37 Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2016-03-23 01:25:06 Re: [PATCH] fix DROP OPERATOR to reset links to itself on commutator and negator
Previous Message Thomas Munro 2016-03-23 01:18:37 Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)