Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5);

From: "Reko Turja" <reko(dot)turja(at)liukuma(dot)net>
To: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5);
Date: 2016-03-24 14:02:08
Message-ID: 651C85285757450B8B88749830A53680@Rivendell
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Tom Lane wrote:

> Indeed. To try to put some scope on the problem, I made an idiot
> little
> program that just generates some random UTF8 strings and sees
> whether
> strcoll and strxfrm sort them alike. Attached are that program, a
> even
> more idiot little shell script that runs it over all available UTF8
> locales, and the results on my RHEL6 box. While de_DE seems to be
> the
> worst-broken locale, it's far from the only one.
>
> Please try this on as many platforms as you can get hold of ...

Platform - FreeBSD 10.2, everything built from source using clang:

./tryalllocales.sh
Using LC_COLLATE = "af_ZA.UTF-8"
Using LC_CTYPE = "af_ZA.UTF-8"
af_ZA.UTF-8 good
Using LC_COLLATE = "am_ET.UTF-8"
Using LC_CTYPE = "am_ET.UTF-8"
am_ET.UTF-8 good
Using LC_COLLATE = "be_BY.UTF-8"
Using LC_CTYPE = "be_BY.UTF-8"
be_BY.UTF-8 good
Using LC_COLLATE = "bg_BG.UTF-8"
Using LC_CTYPE = "bg_BG.UTF-8"
bg_BG.UTF-8 good
Using LC_COLLATE = "ca_AD.UTF-8"
Using LC_CTYPE = "ca_AD.UTF-8"
ca_AD.UTF-8 good
Using LC_COLLATE = "ca_ES.UTF-8"
Using LC_CTYPE = "ca_ES.UTF-8"
ca_ES.UTF-8 good
Using LC_COLLATE = "ca_FR.UTF-8"
Using LC_CTYPE = "ca_FR.UTF-8"
ca_FR.UTF-8 good
Using LC_COLLATE = "ca_IT.UTF-8"
Using LC_CTYPE = "ca_IT.UTF-8"
ca_IT.UTF-8 good
Using LC_COLLATE = "cs_CZ.UTF-8"
Using LC_CTYPE = "cs_CZ.UTF-8"
cs_CZ.UTF-8 good
Using LC_COLLATE = "da_DK.UTF-8"
Using LC_CTYPE = "da_DK.UTF-8"
da_DK.UTF-8 good
Using LC_COLLATE = "de_AT.UTF-8"
Using LC_CTYPE = "de_AT.UTF-8"
de_AT.UTF-8 good
Using LC_COLLATE = "de_CH.UTF-8"
Using LC_CTYPE = "de_CH.UTF-8"
de_CH.UTF-8 good
Using LC_COLLATE = "de_DE.UTF-8"
Using LC_CTYPE = "de_DE.UTF-8"
de_DE.UTF-8 good
Using LC_COLLATE = "el_GR.UTF-8"
Using LC_CTYPE = "el_GR.UTF-8"
el_GR.UTF-8 good
Using LC_COLLATE = "en_AU.UTF-8"
Using LC_CTYPE = "en_AU.UTF-8"
en_AU.UTF-8 good
Using LC_COLLATE = "en_CA.UTF-8"
Using LC_CTYPE = "en_CA.UTF-8"
en_CA.UTF-8 good
Using LC_COLLATE = "en_GB.UTF-8"
Using LC_CTYPE = "en_GB.UTF-8"
en_GB.UTF-8 good
Using LC_COLLATE = "en_IE.UTF-8"
Using LC_CTYPE = "en_IE.UTF-8"
en_IE.UTF-8 good
Using LC_COLLATE = "en_NZ.UTF-8"
Using LC_CTYPE = "en_NZ.UTF-8"
en_NZ.UTF-8 good
Using LC_COLLATE = "en_US.UTF-8"
Using LC_CTYPE = "en_US.UTF-8"
en_US.UTF-8 good
Using LC_COLLATE = "es_ES.UTF-8"
Using LC_CTYPE = "es_ES.UTF-8"
es_ES.UTF-8 good
Using LC_COLLATE = "et_EE.UTF-8"
Using LC_CTYPE = "et_EE.UTF-8"
et_EE.UTF-8 good
Using LC_COLLATE = "eu_ES.UTF-8"
Using LC_CTYPE = "eu_ES.UTF-8"
eu_ES.UTF-8 good
Using LC_COLLATE = "fi_FI.UTF-8"
Using LC_CTYPE = "fi_FI.UTF-8"
fi_FI.UTF-8 good
Using LC_COLLATE = "fr_BE.UTF-8"
Using LC_CTYPE = "fr_BE.UTF-8"
fr_BE.UTF-8 good
Using LC_COLLATE = "fr_CA.UTF-8"
Using LC_CTYPE = "fr_CA.UTF-8"
fr_CA.UTF-8 good
Using LC_COLLATE = "fr_CH.UTF-8"
Using LC_CTYPE = "fr_CH.UTF-8"
fr_CH.UTF-8 good
Using LC_COLLATE = "fr_FR.UTF-8"
Using LC_CTYPE = "fr_FR.UTF-8"
fr_FR.UTF-8 good
Using LC_COLLATE = "he_IL.UTF-8"
Using LC_CTYPE = "he_IL.UTF-8"
he_IL.UTF-8 good
Using LC_COLLATE = "hr_HR.UTF-8"
Using LC_CTYPE = "hr_HR.UTF-8"
hr_HR.UTF-8 good
Using LC_COLLATE = "hu_HU.UTF-8"
Using LC_CTYPE = "hu_HU.UTF-8"
hu_HU.UTF-8 good
Using LC_COLLATE = "hy_AM.UTF-8"
Using LC_CTYPE = "hy_AM.UTF-8"
hy_AM.UTF-8 good
Using LC_COLLATE = "is_IS.UTF-8"
Using LC_CTYPE = "is_IS.UTF-8"
is_IS.UTF-8 good
Using LC_COLLATE = "it_CH.UTF-8"
Using LC_CTYPE = "it_CH.UTF-8"
it_CH.UTF-8 good
Using LC_COLLATE = "it_IT.UTF-8"
Using LC_CTYPE = "it_IT.UTF-8"
it_IT.UTF-8 good
Using LC_COLLATE = "ja_JP.UTF-8"
Using LC_CTYPE = "ja_JP.UTF-8"
ja_JP.UTF-8 good
Using LC_COLLATE = "kk_KZ.UTF-8"
Using LC_CTYPE = "kk_KZ.UTF-8"
kk_KZ.UTF-8 good
Using LC_COLLATE = "ko_KR.UTF-8"
Using LC_CTYPE = "ko_KR.UTF-8"
ko_KR.UTF-8 good
Using LC_COLLATE = "lt_LT.UTF-8"
Using LC_CTYPE = "lt_LT.UTF-8"
lt_LT.UTF-8 good
Using LC_COLLATE = "lv_LV.UTF-8"
Using LC_CTYPE = "lv_LV.UTF-8"
lv_LV.UTF-8 good
Using LC_COLLATE = "mn_MN.UTF-8"
Using LC_CTYPE = "mn_MN.UTF-8"
mn_MN.UTF-8 good
Using LC_COLLATE = "nb_NO.UTF-8"
Using LC_CTYPE = "nb_NO.UTF-8"
nb_NO.UTF-8 good
Using LC_COLLATE = "nl_BE.UTF-8"
Using LC_CTYPE = "nl_BE.UTF-8"
nl_BE.UTF-8 good
Using LC_COLLATE = "nl_NL.UTF-8"
Using LC_CTYPE = "nl_NL.UTF-8"
nl_NL.UTF-8 good
Using LC_COLLATE = "nn_NO.UTF-8"
Using LC_CTYPE = "nn_NO.UTF-8"
nn_NO.UTF-8 good
Using LC_COLLATE = "no_NO.UTF-8"
Using LC_CTYPE = "no_NO.UTF-8"
no_NO.UTF-8 good
Using LC_COLLATE = "pl_PL.UTF-8"
Using LC_CTYPE = "pl_PL.UTF-8"
pl_PL.UTF-8 good
Using LC_COLLATE = "pt_BR.UTF-8"
Using LC_CTYPE = "pt_BR.UTF-8"
pt_BR.UTF-8 good
Using LC_COLLATE = "pt_PT.UTF-8"
Using LC_CTYPE = "pt_PT.UTF-8"
pt_PT.UTF-8 good
Using LC_COLLATE = "ro_RO.UTF-8"
Using LC_CTYPE = "ro_RO.UTF-8"
ro_RO.UTF-8 good
Using LC_COLLATE = "ru_RU.UTF-8"
Using LC_CTYPE = "ru_RU.UTF-8"
ru_RU.UTF-8 good
Using LC_COLLATE = "sk_SK.UTF-8"
Using LC_CTYPE = "sk_SK.UTF-8"
sk_SK.UTF-8 good
Using LC_COLLATE = "sl_SI.UTF-8"
Using LC_CTYPE = "sl_SI.UTF-8"
sl_SI.UTF-8 good
Using LC_COLLATE = "sr_YU.UTF-8"
Using LC_CTYPE = "sr_YU.UTF-8"
sr_YU.UTF-8 good
Using LC_COLLATE = "sv_SE.UTF-8"
Using LC_CTYPE = "sv_SE.UTF-8"
sv_SE.UTF-8 good
Using LC_COLLATE = "tr_TR.UTF-8"
Using LC_CTYPE = "tr_TR.UTF-8"
tr_TR.UTF-8 good
Using LC_COLLATE = "uk_UA.UTF-8"
Using LC_CTYPE = "uk_UA.UTF-8"
uk_UA.UTF-8 good
Using LC_COLLATE = "zh_CN.UTF-8"
Using LC_CTYPE = "zh_CN.UTF-8"
zh_CN.UTF-8 good
Using LC_COLLATE = "zh_HK.UTF-8"
Using LC_CTYPE = "zh_HK.UTF-8"
zh_HK.UTF-8 good
Using LC_COLLATE = "zh_TW.UTF-8"
Using LC_CTYPE = "zh_TW.UTF-8"
zh_TW.UTF-8 good

-Reko

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2016-03-24 14:03:52 Re: BUG #14042: bug, PostgreSQL not cleanup temp table info after crash.
Previous Message Magnus Hagander 2016-03-24 13:04:22 Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)