Quick Links

Re: pg_collation.collversion for C.UTF-8

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Daniel Verite <daniel(at)manitou-mail(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: pg_collation.collversion for C.UTF-8
Date:	2023-04-19 02:07:13
Message-ID:	CA+hUKGKTAEOvh72BoUKX6iwRJ0p3OGXFp1Az96NZ7fXemt33rw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Apr 19, 2023 at 1:30 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Wed, 2023-04-19 at 07:48 +1200, Thomas Munro wrote:
> > Many OSes have a locale with this name. I don't know this history,
> > who did it first etc, but now I am wondering if they all took the
> > "obvious" interpretation, that it should be code-point based,
> > extrapolating from "C" (really memcmp order):
>
> memcmp() is not the same as code-point order in all encodings, right?

Right. I wasn't trying to suggest that *we* should assume that, I was
just thinking out loud about how a libc implementor would surely think
that a "C.encoding" should work, in the spirit of "C", given that the
standard doesn't tell us IIUC. It looks like for technical reasons
inside glibc, that couldn't be done before 2.35:

https://sourceware.org/bugzilla/show_bug.cgi?id=17318

That strengthens my opinion that C.UTF-8 (the real C.UTF-8 supplied by
the glibc project) isn't supposed to be versioned, but it's extremely
unfortunate that a bunch of OSes (Debian and maybe more) have been
sorting text in some other order under that name for years.

> I've been thinking that we should have a "provider=none" for the
> special cases that use memcmp(). It's not using libc as a collation
> provider; it's really postgres in control of the semantics.

Yeah, interesting idea.

In response to

Re: pg_collation.collversion for C.UTF-8 at 2023-04-19 01:30:13 from Jeff Davis

Responses

Re: pg_collation.collversion for C.UTF-8 at 2023-04-22 17:22:24 from Daniel Verite
Re: pg_collation.collversion for C.UTF-8 at 2023-05-25 18:30:11 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jonathan S. Katz	2023-04-19 02:31:15	Re: check_strxfrm_bug()
Previous Message	Jeff Davis	2023-04-19 01:30:13	Re: pg_collation.collversion for C.UTF-8