Re: Unicode FFFF Special Codepoint should always collate high.

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Telford Tendys <psql(at)lnx-bsp(dot)net>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Unicode FFFF Special Codepoint should always collate high.
Date: 2021-06-23 22:29:15
Message-ID: CA+hUKGKcTvSMbqTnOcKxyOMAo6fKkc7FW5qNLsoxMyiK6pB=kQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Jun 23, 2021 at 9:57 PM Telford Tendys <psql(at)lnx-bsp(dot)net> wrote:
> I trust those guys, they will figure it out. I strongly predict that
> they will keep the behaviour consistent with RHEL 7.

I'd doubt that. It's well known that glibc 2.28 (what RHEL8 upgraded
to) included changes that affected everybody by changing the sort
order of common symbols like '-' (though every upgrade potentially
contains subtle changes affecting just a few specific languages), but
I consider the recent big change an improvement because it now agrees
more often with other operating systems and libraries that use CLDR.
Even if you are right that FFFF's sort-high rule should be exposed to
users (need references), RHEL7 was also wrong in that case.

> Is there an easy way to make normal Linux glibc utilities (e.g. sort)
> use a locale from the ICU library? There's a package availble one Centos-8
> here:
>
> libicu-60.3-2.el8_1.x86_64
>
> Trouble is that only a few applications use it, and I can't find any way
> to plug-in / plug-out this functionality. Introducing postgresql details to
> the bugzilla ticket will muddy the water and create a aura of diffuse
> responsibility. What I've found is generally where there's a lot of words,
> people don't read them.

I don't know, but since you know perl, it might be easy to make a
demonstration with https://metacpan.org/pod/Unicode::ICU::Collator.
Looks as simple as $collator->sort(my_list).

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Korotkov 2021-06-23 22:41:42 Re: BUG #17066: Cache lookup failed when null (unknown) is passed as anycompatiblemultirange
Previous Message Tom Lane 2021-06-23 21:48:25 Re: BUG #17071: ORDER BY gets ignored when result set has only one row, but another one gets added by rollup()