From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Built-in CTYPE provider |
Date: | 2024-07-18 23:39:08 |
Message-ID: | 20240718233908.52.nmisch@google.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jul 18, 2024 at 01:03:31PM -0400, Tom Lane wrote:
> This whole discussion seems quite bizarre to me. In the first
> place, it is certain that Unicode will continue to evolve, and
> I can't believe that we'd just freeze pg_c_utf8 on the current
> definition forever. Whether the first change happens in v18
> or years later doesn't seem like a particularly critical point.
>
> In the second place, I cannot understand why pg_c_utf8 is being
> held to a mutability standard that we have never applied to any
> other locale-related functionality --- and indeed could not do
> so, since in most cases that functionality has been buried in
> libraries we don't control. It seems to me to be already a
With libc and ICU providers, packagers have a way to avoid locale-related
behavior changes. That's the "mutability standard" I want pg_c_utf8 to join.
pg_c_utf8 is the one provider where packagers can't opt out[1] of annual
pg_upgrade-time index scan breakage on affected expression indexes.
> great step forward that with pg_c_utf8, at least we can guarantee
> that the behavior won't change without us knowing about it.
> Noah's desire to revert the feature makes the mutability situation
> strictly worse, because people will have to continue to rely on
> OS-provided functionality that can change at any time.
I see:
- one step forward:
"string1 < string2" won't change, forever, regardless of packager choices
- one step backward:
"string ~ '[[:alpha:]]'" will change at pg_upgrade time, regardless of packager choices
I think one's perspective on the relative importance of the step forward and
the step backward depends on the sort of packages one uses today. Consider a
user of Debian packages with locale!=C, doing Debian upgrades and pg_upgrade.
For that user, pg_c_utf8 gets far less index corruption than an ICU locale.
The step forward is a great step forward _for this user_, and the step
backward is in the noise next to the step forward.
I'm with a different kind of packager. I don't tolerate index scans returning
wrong answers. To achieve that, my libc and ICU aren't changing collation
behavior. I suspect my packages won't offer a newer ICU behavior until
PostgreSQL gets support for multiple ICU library versions per database. (SQL
Server, DB2 and Oracle already do. I agree we can't freeze forever. The
multiple-versions feature gets more valuable each year.) _For this_ kind of
package, the step forward is a no-op. The step backward is the sole effect on
this kind of package.
How much does that pair of perspectives explain the contrast between my
"revert" and your "great step forward"? We may continue to disagree on the
ultimate decision, but I hope I can make my position cease to appear bizarre
to you.
Thanks,
nm
[1] Daniel Verite said packagers could patch src/Makefile.global.in and run
"make -C src/common/unicode update-unicode". Editing src/Makefile.global.in
is modifying PostgreSQL, not configuring a packager-facing option.
From | Date | Subject | |
---|---|---|---|
Next Message | jian he | 2024-07-19 00:11:00 | Re: Adding OLD/NEW support to RETURNING |
Previous Message | Peter Smith | 2024-07-18 23:29:02 | Re: Pgoutput not capturing the generated columns |