From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com> |
Cc: | Jakub Jedelsky <jakub(dot)jedelsky(at)gooddata(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: case insensitive collation of Greek's sigma |
Date: | 2021-12-01 19:49:24 |
Message-ID: | 1989905.1638388164@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com> writes:
> Running lower() like this is really the wrong thing to do. We should be
> doing "case folding" instead, which normalizes these differences for the
> purpose of case-insensitive comparisons.
That just begs the question: if tolower (or towlower) isn't the
appropriate API, what is? Perhaps ICU has something for a more
generalized notion of case-similarity, but I'm not aware of any such
thing in the POSIX API.
BTW, I think it's only accidental that the regex example shown upthread
gets the right answer. In that example, what's happening is that we
consider a letter in a case-insensitive regex to match itself, or
tolower() of itself, or toupper() of itself. Both σ and ς have Σ
as toupper() so they both work. But if you'd written Σ in the regex,
only one of σ and ς would match that as a data character. (Haven't
actually tested this, but given the way the code works I'm pretty
sure it's so.) Again, it's hard to see how to do better atop a POSIX
locale library.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2021-12-01 19:52:58 | Re: INSERT ... ON CONFLICT doesn't work |
Previous Message | Jenda Krynicky | 2021-12-01 19:43:50 | Re: INSERT ... ON CONFLICT doesn't work |