Re: pg_collation.collversion for C.UTF-8

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_collation.collversion for C.UTF-8
Date: 2023-04-18 19:48:05
Message-ID: CA+hUKGLALgS3bFStFrv26mV9JahZzAbAVyk3+03QZVpJDrrFvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 19, 2023 at 12:36 AM Daniel Verite <daniel(at)manitou-mail(dot)org> wrote:
> This seems to be based on the idea that C.* collations provide an
> immutable sort like "C", but it appears that it's not the case.

Hmm. It seems I added that exemption initially for FreeBSD only in
ca051d8b101, and then merged the cases for several OSes in
beb4480c853.

It's extremely surprising to me that the sort order changed. I
expected the sort order to be code point order:

https://sourceware.org/glibc/wiki/Proposals/C.UTF-8

One interesting thing is that it seems that it might have been
independently invented by Debian (?) and then harmonised with glibc
2.35:

https://www.mail-archive.com/debian-bugs-dist(at)lists(dot)debian(dot)org/msg1871363.html

Was the earlier Debian version buggy, or did it simply have a
different idea of what the sort order should be, intentionally? Ugh.
From your examples, we can see that the older Debian system did not
have A < [some 4 digit code point], while the later version did (as
expected). If so then it might be tempting to *not* do what you're
suggesting, since the stated goal of the thing is to be stable from
now on. But it changed once in the early years of its existence!
Annoying.

Many OSes have a locale with this name. I don't know this history,
who did it first etc, but now I am wondering if they all took the
"obvious" interpretation, that it should be code-point based,
extrapolating from "C" (really memcmp order):

https://unix.stackexchange.com/questions/597962/how-widespread-is-the-c-utf-8-locale

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2023-04-18 19:53:46 Re: Request for comment on setting binary format output per session
Previous Message Greg Stark 2023-04-18 19:35:09 Re: Direct I/O