From: | "Daniel Verite" <daniel(at)manitou-mail(dot)org> |
---|---|
To: | "Jeff Davis" <pgsql(at)j-davis(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pg_collation.collversion for C.UTF-8 |
Date: | 2023-06-05 17:43:26 |
Message-ID: | 5ad8d2f8-c11f-46d6-aab5-ed529d8e958a@manitou-mail.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Jeff Davis wrote:
> > For libc: this change may affect any user who happened to have
> > LANG=C.UTF-8 in their environment at initdb time, which is probably a
> > lot of users, and some buildfarm members. However, the average risk
> > seems to be much lower, because we've gone a long time with the
> > assumption that C.UTF-8 has the same behavior as C, and this only
> > recently came up.
Currently, neither lc_collate_is_c() nor lookup_collation_cache()
think that C.UTF-8 is a C collation, since they do that kind of test:
if (strcmp(localeptr, "C") == 0)
result = true;
else if (strcmp(localeptr, "POSIX") == 0)
result = true;
else
result = false;
What is relatively new (v15) is that we compute a version for libc
collations in get_collation_actual_version(), with code that assumes
that C.* does not need a version, implying that it's immune to
Unicode changes. What came up in this thread is that this assumption
is not true for at least one major platform: Debian/Ubuntu for
releases occurring before 2022 (glibc < 2.35).
> We can avoid this risk by converting C.anything or POSIX.anything to
> plain "C" or "POSIX", respectively, for new collations before storing
> the string in the catalog. For upgraded collations, we can preserve the
> existing locale name. When opening the locale, we would still only
> recognize plain "C" and "POSIX" as the C locale.
Then Postgres would not sort the same as the operating system with the
same locale, at least on some OS. Concerning glibc, after waiting a
few years, glibc<2.35 will be obsolete, and C.UTF-8 sorting like C
will happen by itself.
But in the meantime, personally I don't quite see why Postgres should
start forcing C.UTF-8 to sort differently in the database than in the
OS.
Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2023-06-05 17:54:58 | Re: Order changes in PG16 since ICU introduction |
Previous Message | Heikki Linnakangas | 2023-06-05 17:43:17 | Re: Let's make PostgreSQL multi-threaded |