From: | Jeremy Schneider <schneider(at)ardentperf(dot)com> |
---|---|
To: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> |
Cc: | Daniel Verite <daniel(at)manitou-mail(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [18] Policy on IMMUTABLE functions and Unicode updates |
Date: | 2024-07-23 12:31:56 |
Message-ID: | CA+fnDAbmn2d5tzZsj-4wmD0jApHTsg_zGWUpteb=OMSsX5rdAg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jul 23, 2024 at 1:11 AM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
wrote:
> On Mon, 2024-07-22 at 13:55 -0400, Robert Haas wrote:
> > On Mon, Jul 22, 2024 at 1:18 PM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
> wrote:
> > > I understand the difficulty (madness) of discussing every Unicode
> > > change. If that's unworkable, my preference would be to stick with
> some
> > > Unicode version and never modify it, ever.
> >
> > I think that's a completely non-viable way forward. Even if everyone
> > here voted in favor of that, five years from now there will be someone
> > who shows up to say "I can't use your crappy software because the
> > Unicode tables haven't been updated in five years, here's a patch!".
> > And, like, what are we going to do? Still keeping shipping the 2024
> > version of Unicode four hundred years from now, assuming humanity and
> > civilization and PostgreSQL are still around then? Holding something
> > still "forever" is just never going to work.
>
> I hear you. It would be interesting to know what other RDBMS do here.
Other RDBMS are very careful not to corrupt databases, afaik including
function based indexes, by changing Unicode. I’m not aware of any other
RDBMS that updates Unicode versions in place; instead they support multiple
Unicode versions and do not drop the old ones.
See also:
https://www.postgresql.org/message-id/E8754F74-C65F-4A1A-826F-FD9F37599A2E%40ardentperf.com
I know Jeff mentioned that Unicode tables copied into Postgres for
normalization have been updated a few times. Did anyone ever actually
discuss the fact that things like function based indexes can be corrupted
by this, and weigh the reasoning? Are there past mailing list threads
touching on the corruption problem and making the argument why updating
anyway is the right thing to do? I always assumed that nobody had really
dug deeply into this before the last few years.
I do agree it isn’t as broad of a problem as linguistic collation itself,
which causes a lot more widespread corruption when it changes (as we’ve
seen from glibc 2.28 and also other older hacker mailing list threads about
smaller changes in older glibc versions corrupting databases). For now,
Postgres only has code-point collation and the other Unicode functions
mentioned in this thread.
-Jeremy
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2024-07-23 12:34:00 | Re: Use read streams in CREATE DATABASE command when the strategy is wal_log |
Previous Message | Robert Haas | 2024-07-23 12:11:50 | Re: [18] Policy on IMMUTABLE functions and Unicode updates |