From: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
---|---|
To: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, patrick(dot)van(dot)dijk(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18523: String compare not consistent |
Date: | 2024-06-26 18:30:56 |
Message-ID: | 30c9e733-9573-4500-b191-5a8a67b3795f@eisentraut.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 26.06.24 17:59, Laurenz Albe wrote:
> Interestingly, glibc and ICU disagree about that:
>
> SELECT '|1' < '01' COLLATE "de_DE.utf8" AS glibc,
> '|1' < '01' COLLATE "de-AT-x-icu" AS icu;
>
> glibc │ icu
> ═══════╪═════
> f │ t
> (1 row)
Let's work this out.
The collation weights of the characters involved are:
0030 ; [.209F.0020.0002] # DIGIT ZERO
0031 ; [.20A0.0020.0002] # DIGIT ONE
007C ; [*06AF.0020.0002] # VERTICAL LINE
So under the ICU default "noignore" behavior, the sort keys are:
'|1' => 06AF 20A0 0000 0020 0020 0000 0002 0002
'01' => 209F 20A0 0000 0020 0020 0000 0002 0002
Therefore, '|1' < '01'.
Under "shifted", the effective collation weights are:
0030 ; [.209F.0020.0002.FFFF] # DIGIT ZERO
0031 ; [.20A0.0020.0002.FFFF] # DIGIT ONE
007C ; [.0000.0000.0000.06AF] # VERTICAL LINE
and the sort keys are:
'|1' => 20A0 0000 0020 0000 0002 0000 06AF FFFF
'01' => 209F 20A0 0000 0020 0020 0000 0002 0002 0000 FFFF FFFF
Therefore, '|1' > '01'. That is what you get in glibc. (Older glibc
use "shift-trimmed" behavior, which trims the trailing FFFF's, which
doesn't affect the result in this case.)
Under ICU (or rather CLDR), the '|' character is not actually a variable
collation element by default, so you actually get the same result if you
ask for "noignore" or "shifted" behavior. But you can make symbols
variable with 'und-u-ka-shifted-kv-symbol', and then you will also get
the same result as glibc.
Further reading:
-
https://peter.eisentraut.org/blog/2023/04/12/how-collation-of-punctuation-and-whitespace-works
-
https://peter.eisentraut.org/blog/2023/05/16/overview-of-icu-collation-settings
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2024-06-26 20:58:13 | Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows |
Previous Message | Tom Lane | 2024-06-26 16:04:04 | Re: BUG #18523: String compare not consistent |