Quick Links

Re: case insensitive collation of Greek's sigma

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc:	Jakub Jedelsky <jakub(dot)jedelsky(at)gooddata(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject:	Re: case insensitive collation of Greek's sigma
Date:	2021-12-01 19:49:24
Message-ID:	1989905.1638388164@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com> writes:
> Running lower() like this is really the wrong thing to do. We should be
> doing "case folding" instead, which normalizes these differences for the
> purpose of case-insensitive comparisons.

That just begs the question: if tolower (or towlower) isn't the
appropriate API, what is? Perhaps ICU has something for a more
generalized notion of case-similarity, but I'm not aware of any such
thing in the POSIX API.

BTW, I think it's only accidental that the regex example shown upthread
gets the right answer. In that example, what's happening is that we
consider a letter in a case-insensitive regex to match itself, or
tolower() of itself, or toupper() of itself. Both σ and ς have Σ
as toupper() so they both work. But if you'd written Σ in the regex,
only one of σ and ς would match that as a data character. (Haven't
actually tested this, but given the way the code works I'm pretty
sure it's so.) Again, it's hard to see how to do better atop a POSIX
locale library.

regards, tom lane

In response to

Re: case insensitive collation of Greek's sigma at 2021-12-01 19:29:33 from Peter Eisentraut

Responses

Re: case insensitive collation of Greek's sigma at 2021-12-02 13:26:39 from Jakub Jedelsky

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tom Lane	2021-12-01 19:52:58	Re: INSERT ... ON CONFLICT doesn't work
Previous Message	Jenda Krynicky	2021-12-01 19:43:50	Re: INSERT ... ON CONFLICT doesn't work