Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

From: Noah Misch <noah(at)leadboat(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, "Guo, Adam" <adamguo(at)amazon(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>
Subject: Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Date: 2024-05-18 21:45:46
Message-ID: 20240518214546.e8@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 15, 2024 at 02:56:54PM +0900, Masahiko Sawada wrote:
> On Sat, May 4, 2024 at 7:36 AM Joe Conway <mail(at)joeconway(dot)com> wrote:
> > On 5/3/24 11:44, Peter Eisentraut wrote:
> > > On 03.05.24 16:13, Tom Lane wrote:
> > >> Peter Eisentraut <peter(at)eisentraut(dot)org> writes:
> > >>> On 30.04.24 19:29, Tom Lane wrote:
> > >>>> Also, the bigger picture here is the seeming assumption that "if
> > >>>> we change pg_trgm then it will be safe to replicate from x86 to
> > >>>> arm". I don't believe that that's a good idea and I'm unwilling
> > >>>> to promise that it will work, regardless of what we do about
> > >>>> char signedness. That being the case, I don't want to invest a
> > >>>> lot of effort in the signedness issue. Option (1) is clearly
> > >>>> a small change with little if any risk of future breakage.
> > >>
> > >>> But note that option 1 would prevent some replication that is currently
> > >>> working.
> > >>
> > >> The point of this thread though is that it's working only for small
> > >> values of "work". People are rightfully unhappy if it seems to work
> > >> and then later they get bitten by compatibility problems.
> > >>
> > >> Treating char signedness as a machine property in pg_control would
> > >> signal that we don't intend to make it work, and would ensure that
> > >> even the most minimal testing would find out that it doesn't work.
> > >>
> > >> If we do not do that, it seems to me we have to buy into making
> > >> it work. That would mean dealing with the consequences of an
> > >> incompatible change in pg_trgm indexes, and then going through
> > >> the same dance again the next time(s) similar problems are found.
> > >
> > > Yes, that is understood. But anecdotally, replicating between x86-64 arm64 is
> > > occasionally used for upgrades or migrations. In practice, this appears to have
> > > mostly worked. If we now discover that it won't work with certain index
> > > extension modules, it's usable for most users. Even if we say, you have to
> > > reindex everything afterwards, it's probably still useful for these scenarios.

Like you, I would not introduce a ControlFileData block for sign-of-char,
given the signs of breakage in extension indexing only. That would lose much
useful migration capability. I'm sympathetic to Tom's point, which I'll
attempt to summarize as: a ControlFileData block is a promise we know how to
keep, whereas we should expect further bug discoveries without it. At the
same time, I put more weight on the apparently-wide span of functionality that
works fine. (I wonder whether any static analyzer does or practically could
detect char sign dependence with a decent false positive rate.)

> +1
>
> How about extending amcheck to support GIN and GIst indexes so that it
> can detect potential data incompatibility due to changing 'char' to
> 'unsigned char'? I think these new tests would be useful also for
> users to check if they really need to reindex indexes due to such
> changes. Also we fix pg_trgm so that it uses 'unsigned char' in PG18.
> Users who upgraded to PG18 can run the new amcheck tests on the
> primary as well as the standby.

If I were standardizing pg_trgm on one or the other notion of "char", I would
choose signed char, since I think it's still the majority. More broadly, I
see these options to fix pg_trgm:

1. Change to signed char. Every arm64 system needs to scan pg_trgm indexes.
2. Change to unsigned char. Every x86 system needs to scan pg_trgm indexes.
3. Offer both, as an upgrade path. For example, pg_trgm could have separate
operator classes gin_trgm_ops and gin_trgm_ops_unsigned. Running
pg_upgrade on an unsigned-char system would automatically map v17
gin_trgm_ops to v18 gin_trgm_ops_unsigned. This avoids penalizing any
architecture with upgrade-time scans.

Independently, having amcheck for GIN and/or GiST would be great.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2024-05-18 21:51:56 Re: First draft of PG 17 release notes
Previous Message Yasir 2024-05-18 21:41:39 Re: Ignore Visual Studio's Temp Files While Working with PG on Windows