case insensitive collation of Greek's sigma

From: Jakub Jedelsky <jakub(dot)jedelsky(at)gooddata(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: case insensitive collation of Greek's sigma
Date: 2021-11-26 07:37:47
Message-ID: CAC1JxDQhi_M4WO4e19qPmcOGpN6NKJz60zsdXMuCHwVuE3_Ldw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

during our tests of Postgres with ICU we found an issue with ILIKE of upper
and lowercase sigma (Σ). The letter has two lowercase variants σ and ς (at
the end of a word). I'm working with en_US and en-US-x-icu collations and
results are a bit unexpected - they are inverted:

postgres=# SELECT
postgres-# 'ΣΣ' ILIKE 'σσ' COLLATE "en_US",
postgres-# 'ΣΣ' ILIKE 'σς' COLLATE "en_US"
postgres-# ;
?column? | ?column?
----------+----------
t | f
(1 row)

postgres=# SELECT
postgres-# 'ΣΣ' ILIKE 'σσ' COLLATE "en-US-x-icu",
postgres-# 'ΣΣ' ILIKE 'σς' COLLATE "en-US-x-icu";
?column? | ?column?
----------+----------
f | t
(1 row)

I run those commands on the latest (14.1) official docker image.

Is it possible to unify the behaviour?And which one is correct from the
community point of view?

If I could start, I think both results are wrong as both should return
True. If I got it right, in the background there is a lower() function
running to compare strings, which is not enough for such cases (until the
left side isn't taken as a standalone word).

Thanks,

- jj

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Achilleas Mantzios 2021-11-26 08:21:37 Re: case insensitive collation of Greek's sigma
Previous Message David G. Johnston 2021-11-26 00:23:33 Re: Best examples of cardinality check and associated functions