Quick Links

Re: case insensitive collation of Greek's sigma

From:	Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
To:	Jakub Jedelsky <jakub(dot)jedelsky(at)gooddata(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject:	Re: case insensitive collation of Greek's sigma
Date:	2021-12-01 19:29:33
Message-ID:	9e3220da-d47e-add7-8b97-7c65b12ff6d7@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 26.11.21 08:37, Jakub Jedelsky wrote:
> postgres=# SELECT
> postgres-# 'ΣΣ' ILIKE 'σσ' COLLATE "en_US",
> postgres-# 'ΣΣ' ILIKE 'σς' COLLATE "en_US"
> postgres-# ;
> ?column? | ?column?
> ----------+----------
> t | f
> (1 row)
>
> postgres=# SELECT
> postgres-# 'ΣΣ' ILIKE 'σσ' COLLATE "en-US-x-icu",
> postgres-# 'ΣΣ' ILIKE 'σς' COLLATE "en-US-x-icu";
> ?column? | ?column?
> ----------+----------
> f | t
> (1 row)
>
> If I could start, I think both results are wrong as both should return
> True. If I got it right, in the background there is a lower() function
> running to compare strings, which is not enough for such cases (until
> the left side isn't taken as a standalone word).

The reason for these results is that for multibyte encodings, a ILIKE b
basically does lower(a) LIKE lower(b), and

select lower('ΣΣ' COLLATE "en_US"), lower('ΣΣ' COLLATE "en-US-x-icu");
lower | lower
-------+-------
σσ | σς

Running lower() like this is really the wrong thing to do. We should be
doing "case folding" instead, which normalizes these differences for the
purpose of case-insensitive comparisons.

In response to

case insensitive collation of Greek's sigma at 2021-11-26 07:37:47 from Jakub Jedelsky

Responses

Re: case insensitive collation of Greek's sigma at 2021-12-01 19:49:24 from Tom Lane

Browse pgsql-general by date

	From	Date	Subject
Next Message	Adrian Klaver	2021-12-01 19:31:27	Re: INSERT ... ON CONFLICT doesn't work
Previous Message	Jenda Krynicky	2021-12-01 19:20:56	INSERT ... ON CONFLICT doesn't work