Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, "Guo, Adam" <adamguo(at)amazon(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>
Subject: Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Date: 2024-08-29 20:48:53
Message-ID: CAD21AoBO-iL=UgHKQrKE=7zBrMYR_fAhLb8AGo_kmNgmkPKuQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, May 19, 2024 at 6:46 AM Noah Misch <noah(at)leadboat(dot)com> wrote:
>
> On Wed, May 15, 2024 at 02:56:54PM +0900, Masahiko Sawada wrote:
> >
> > How about extending amcheck to support GIN and GIst indexes so that it
> > can detect potential data incompatibility due to changing 'char' to
> > 'unsigned char'? I think these new tests would be useful also for
> > users to check if they really need to reindex indexes due to such
> > changes. Also we fix pg_trgm so that it uses 'unsigned char' in PG18.
> > Users who upgraded to PG18 can run the new amcheck tests on the
> > primary as well as the standby.
>
> If I were standardizing pg_trgm on one or the other notion of "char", I would
> choose signed char, since I think it's still the majority. More broadly, I
> see these options to fix pg_trgm:
>
> 1. Change to signed char. Every arm64 system needs to scan pg_trgm indexes.
> 2. Change to unsigned char. Every x86 system needs to scan pg_trgm indexes.

Even though it's true that signed char systems are the majority, it
would not be acceptable to force the need to scan pg_trgm indexes on
unsigned char systems.

> 3. Offer both, as an upgrade path. For example, pg_trgm could have separate
> operator classes gin_trgm_ops and gin_trgm_ops_unsigned. Running
> pg_upgrade on an unsigned-char system would automatically map v17
> gin_trgm_ops to v18 gin_trgm_ops_unsigned. This avoids penalizing any
> architecture with upgrade-time scans.

Very interesting idea. How can new v18 users use the correct operator
class? I don't want to require users to specify the correct signed or
unsigned operator classes when creating a GIN index. Maybe we need to
dynamically use the correct compare function for the same operator
class depending on the char signedness. But is it possible to do it on
the extension (e.g. pg_trgm) side?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-08-29 20:53:03 Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
Previous Message Jacob Champion 2024-08-29 20:44:01 Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible