From: | Frank Limpert <frank_limpert(at)yahoo(dot)com> |
---|---|
To: | Jakub Jedelsky <jakub(dot)jedelsky(at)gooddata(dot)com> |
Cc: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: case insensitive collation of Greek's sigma |
Date: | 2021-12-01 07:50:56 |
Message-ID: | ca27a92d-9af1-7f39-c04d-44ffcd05de64@yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Am 26.11.21 um 08:37 schrieb Jakub Jedelsky:
> Hello,
>
> during our tests of Postgres with ICU we found an issue with ILIKE of
> upper and lowercase sigma (Σ). The letter has two lowercase variants σ
> and ς (at the end of a word). I'm working with en_US and en-US-x-icu
> collations and results are a bit unexpected - they are inverted:
>
> postgres=# SELECT
> postgres-# 'ΣΣ' ILIKE 'σσ' COLLATE "en_US",
> postgres-# 'ΣΣ' ILIKE 'σς' COLLATE "en_US"
> postgres-# ;
> ?column? | ?column?
> ----------+----------
> t | f
> (1 row)
>
> postgres=# SELECT
> postgres-# 'ΣΣ' ILIKE 'σσ' COLLATE "en-US-x-icu",
> postgres-# 'ΣΣ' ILIKE 'σς' COLLATE "en-US-x-icu";
> ?column? | ?column?
> ----------+----------
> f | t
> (1 row)
>
> I run those commands on the latest (14.1) official docker image.
>
> Is it possible to unify the behaviour?And which one is correct from
> the community point of view?
>
> If I could start, I think both results are wrong as both should return
> True. If I got it right, in the background there is a lower() function
> running to compare strings, which is not enough for such cases (until
> the left side isn't taken as a standalone word).
>
> Thanks,
>
> - jj
Have you seen the subtle intricacies in this example?
=> SELECT 'ΣΣ Μ' ILIKE 'σσ Μ' COLLATE "en_US" AS c0,
'ΣΣ Μ' ILIKE 'σς Μ' COLLATE "en_US" AS c1,
'ΣΣ Μ' ~* 'σσ Μ' COLLATE "el-GR-x-icu" AS c2,
'ΣΣ Μ' ~* 'σς Μ' COLLATE "el-GR-x-icu" AS c3,
'ΣΣ Μ' ILIKE 'σσ Μ' COLLATE "el-GR-x-icu" AS c4,
'ΣΣ Μ' ILIKE 'σς Μ' COLLATE "el-GR-x-icu" AS c5,
'ΣΣ Μ' ~* 'σσ Μ' COLLATE "en-US-x-icu" AS c6,
'ΣΣ Μ' ~* 'σς Μ' COLLATE "en-US-x-icu" AS c7,
'ΣΣ Μ' ILIKE 'σσ Μ' COLLATE "en-US-x-icu" AS c8,
'ΣΣ Μ' ILIKE 'σς Μ' COLLATE "en-US-x-icu" AS c9;
c0 | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c9
----+----+----+----+----+----+----+----+----+----
t | f | t | t | f | t | t | t | f | t
(1 row)
Obviously, the ILIKE operator is really strict regarding to the correct
letter at the end of the word. The regular expression operator works as
you expected.
Happy computing...
Frank
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2021-12-01 08:50:38 | Re: Max connections reached without max connections reached |
Previous Message | Patrick FICHE | 2021-12-01 07:50:44 | RE: Wildcarding json keys in json query |