Re: [18] Policy on IMMUTABLE functions and Unicode updates

From: Jeremy Schneider <schneider(at)ardentperf(dot)com>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [18] Policy on IMMUTABLE functions and Unicode updates
Date: 2024-07-23 12:31:56
Message-ID: CA+fnDAbmn2d5tzZsj-4wmD0jApHTsg_zGWUpteb=OMSsX5rdAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 23, 2024 at 1:11 AM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
wrote:

> On Mon, 2024-07-22 at 13:55 -0400, Robert Haas wrote:
> > On Mon, Jul 22, 2024 at 1:18 PM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
> wrote:
> > > I understand the difficulty (madness) of discussing every Unicode
> > > change. If that's unworkable, my preference would be to stick with
> some
> > > Unicode version and never modify it, ever.
> >
> > I think that's a completely non-viable way forward. Even if everyone
> > here voted in favor of that, five years from now there will be someone
> > who shows up to say "I can't use your crappy software because the
> > Unicode tables haven't been updated in five years, here's a patch!".
> > And, like, what are we going to do? Still keeping shipping the 2024
> > version of Unicode four hundred years from now, assuming humanity and
> > civilization and PostgreSQL are still around then? Holding something
> > still "forever" is just never going to work.
>
> I hear you. It would be interesting to know what other RDBMS do here.

Other RDBMS are very careful not to corrupt databases, afaik including
function based indexes, by changing Unicode. I’m not aware of any other
RDBMS that updates Unicode versions in place; instead they support multiple
Unicode versions and do not drop the old ones.

See also:
https://www.postgresql.org/message-id/E8754F74-C65F-4A1A-826F-FD9F37599A2E%40ardentperf.com

I know Jeff mentioned that Unicode tables copied into Postgres for
normalization have been updated a few times. Did anyone ever actually
discuss the fact that things like function based indexes can be corrupted
by this, and weigh the reasoning? Are there past mailing list threads
touching on the corruption problem and making the argument why updating
anyway is the right thing to do? I always assumed that nobody had really
dug deeply into this before the last few years.

I do agree it isn’t as broad of a problem as linguistic collation itself,
which causes a lot more widespread corruption when it changes (as we’ve
seen from glibc 2.28 and also other older hacker mailing list threads about
smaller changes in older glibc versions corrupting databases). For now,
Postgres only has code-point collation and the other Unicode functions
mentioned in this thread.

-Jeremy

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2024-07-23 12:34:00 Re: Use read streams in CREATE DATABASE command when the strategy is wal_log
Previous Message Robert Haas 2024-07-23 12:11:50 Re: [18] Policy on IMMUTABLE functions and Unicode updates