Re: Moving from Linux to Linux?

From: Joe Conway <mail(at)joeconway(dot)com>
To: Paul Foerster <paul(dot)foerster(at)gmail(dot)com>
Cc: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, Pgsql-General List <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Moving from Linux to Linux?
Date: 2025-03-13 13:14:46
Message-ID: 330ca171-623a-44ab-91a0-7dbf7520fb83@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 3/13/25 06:10, Paul Foerster wrote:
>> The other option, which may be equally untenable, is to upgrade in-
>> place to pg17 and convert everything to use the new built-in
>> collation provider. That ought to be portable across different
>> versions of Linux.

> Is C.UTF8 really the same as en_US.UTF8? I ask because though we use
> en_US.UTF8, we are located in Switzerland and using non English
> characters is not exactly the exception. We have characters from all
> over the world in our databases. There must be no sorting
> differences between en_US.UTF8 and C.UTF8. Otherwise we will run
> into trouble with unhappy customers. So, C.UTF8 would only be an
> option if the collation would be identical.

Definitely not exactly the same. It does handle all the same characters
(UTF8).

How often do you really depend on the ordering to the client being
exactly what the end user wants to see? Often the final ordering is done
in the application. Or could be done. You can also use a COLLATE clause
to get exactly the ordering you want when you need it.

The builtin collation has two big advantages -- 1) it should be stable
and portable, and 2) it should perform substantially faster (in simple
tests I have seen 10X speed ups).

>> The problem you might find with libicu is that different versions
>> of ICU can have the same issues as different versions of glibc,
>> and you might not have the same ICU version available on SLES and
>> RHEL.

> Yes, I know. As far as I have been told, libicu is far less prone to
> major collation changes than glibc is.

Less prone, yes, but still it happens. And when it happens you can get
corruption of your index, which can lead to things like duplicate
primary/unique keys and data going to the wrong partition to cite two
examples.

> Also, libicu offers the possibility to pin a version for a certain
> time. Our sysadmins will naturally not be able to pin a glibc
> version without wrecking an inevitable server upgrade.

Yes, in theory libicu can be pinned more easily than glibc, for sure.
The reality is that you would either need your Linux distro to provide
that pinned version as you upgrade, which I don't think any of them do
(or in the case of SUSE to RHEL they would have to match from the get
go), or you would have to take on maintaining your own pinned version
going forward. That latter option is essentially the same as the glibc
compatibility library approach though, so perhaps not horrible.

>> If you want to explore the compatibility library approach contact
>> me off list and I will try to get you started. It has been a
>> couple of years since I touched it, but when I did it took me a
>> couple of days to get from the AL2 (glibc 2.26) branch (which was
>> done first) to the RHEL 7 (glibc 2.17) branch.

> I just took a quick glance. I don't have a Github account (and also
> don't want one 🤣). I can do a git clone, but that's basically all I
> know. Also, right now, I'm just exploring possibilities. As far as I
> understand the readme on Github, this will replace the glibc on Red
> Had with one with adapted collation rules? If this is the case, then
> our admins will definitely say no to this.

No, it does not replace glibc. It extracts just the locale functionality
from glibc into its own portable library, pretty much like libicu. Then
you can link against it. So for example you wind up with "glibc 2.26
locale semantics" with Postgres on your Linux distro which has glibc
2.34 installed. All of the non-locale functionality comes from the
system glibc 2.34.

>> [1] https://www.joeconway.com/presentations/2025-PGConf.IN-glibc.pdf
> This is a really good one. Thanks very much for this.

You should probably watch this presentation in its entirety:
https://www.youtube.com/watch?v=KTA6oau7tl8
Jeremy does a really good job of dispelling misconceptions and if I
remember correctly Jeff Davis talks about the builtin provider.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message mark bradley 2025-03-13 14:56:01 Re: Duplicate Key Values
Previous Message Laurenz Albe 2025-03-13 11:20:46 Re: Finding execution time for a query