Quick Links

Re: case insensitive collation of Greek's sigma

From:	Frank Limpert <frank_limpert(at)yahoo(dot)com>
To:	Jakub Jedelsky <jakub(dot)jedelsky(at)gooddata(dot)com>
Cc:	pgsql-general(at)lists(dot)postgresql(dot)org
Subject:	Re: case insensitive collation of Greek's sigma
Date:	2021-12-01 07:50:56
Message-ID:	ca27a92d-9af1-7f39-c04d-44ffcd05de64@yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Am 26.11.21 um 08:37 schrieb Jakub Jedelsky:
> Hello,
>
> during our tests of Postgres with ICU we found an issue with ILIKE of
> upper and lowercase sigma (Σ). The letter has two lowercase variants σ
> and ς (at the end of a word). I'm working with en_US and en-US-x-icu
> collations and results are a bit unexpected - they are inverted:
>
> postgres=# SELECT
> postgres-# 'ΣΣ' ILIKE 'σσ' COLLATE "en_US",
> postgres-# 'ΣΣ' ILIKE 'σς' COLLATE "en_US"
> postgres-# ;
> ?column? | ?column?
> ----------+----------
> t | f
> (1 row)
>
> postgres=# SELECT
> postgres-# 'ΣΣ' ILIKE 'σσ' COLLATE "en-US-x-icu",
> postgres-# 'ΣΣ' ILIKE 'σς' COLLATE "en-US-x-icu";
> ?column? | ?column?
> ----------+----------
> f | t
> (1 row)
>
> I run those commands on the latest (14.1) official docker image.
>
> Is it possible to unify the behaviour?And which one is correct from
> the community point of view?
>
> If I could start, I think both results are wrong as both should return
> True. If I got it right, in the background there is a lower() function
> running to compare strings, which is not enough for such cases (until
> the left side isn't taken as a standalone word).
>
> Thanks,
>
> - jj

Have you seen the subtle intricacies in this example?

=> SELECT 'ΣΣ Μ' ILIKE 'σσ Μ' COLLATE "en_US" AS c0,
      'ΣΣ Μ' ILIKE 'σς Μ' COLLATE "en_US" AS c1,
      'ΣΣ Μ' ~*    'σσ Μ' COLLATE "el-GR-x-icu" AS c2,
      'ΣΣ Μ' ~*    'σς Μ' COLLATE "el-GR-x-icu" AS c3,
      'ΣΣ Μ' ILIKE 'σσ Μ' COLLATE "el-GR-x-icu" AS c4,
      'ΣΣ Μ' ILIKE 'σς Μ' COLLATE "el-GR-x-icu" AS c5,
      'ΣΣ Μ' ~*    'σσ Μ' COLLATE "en-US-x-icu" AS c6,
      'ΣΣ Μ' ~*    'σς Μ' COLLATE "en-US-x-icu" AS c7,
      'ΣΣ Μ' ILIKE 'σσ Μ' COLLATE "en-US-x-icu" AS c8,
      'ΣΣ Μ' ILIKE 'σς Μ' COLLATE "en-US-x-icu" AS c9;
c0 | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c9
----+----+----+----+----+----+----+----+----+----
t | f | t | t | f | t | t | t | f | t
(1 row)

Obviously, the ILIKE operator is really strict regarding to the correct
letter at the end of the word. The regular expression operator works as
you expected.

Happy computing...
Frank

In response to

case insensitive collation of Greek's sigma at 2021-11-26 07:37:47 from Jakub Jedelsky

Browse pgsql-general by date

	From	Date	Subject
Next Message	Dilip Kumar	2021-12-01 08:50:38	Re: Max connections reached without max connections reached
Previous Message	Patrick FICHE	2021-12-01 07:50:44	RE: Wildcarding json keys in json query