From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Joe Conway <mail(at)joeconway(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, "Guo, Adam" <adamguo(at)amazon(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Jim Mlodgenski <jimmy76(at)gmail(dot)com> |
Subject: | Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation |
Date: | 2024-08-31 03:10:38 |
Message-ID: | 20240831031038.31.nmisch@google.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Aug 29, 2024 at 03:48:53PM -0500, Masahiko Sawada wrote:
> On Sun, May 19, 2024 at 6:46 AM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > If I were standardizing pg_trgm on one or the other notion of "char", I would
> > choose signed char, since I think it's still the majority. More broadly, I
> > see these options to fix pg_trgm:
> >
> > 1. Change to signed char. Every arm64 system needs to scan pg_trgm indexes.
> > 2. Change to unsigned char. Every x86 system needs to scan pg_trgm indexes.
>
> Even though it's true that signed char systems are the majority, it
> would not be acceptable to force the need to scan pg_trgm indexes on
> unsigned char systems.
>
> > 3. Offer both, as an upgrade path. For example, pg_trgm could have separate
> > operator classes gin_trgm_ops and gin_trgm_ops_unsigned. Running
> > pg_upgrade on an unsigned-char system would automatically map v17
> > gin_trgm_ops to v18 gin_trgm_ops_unsigned. This avoids penalizing any
> > architecture with upgrade-time scans.
>
> Very interesting idea. How can new v18 users use the correct operator
> class? I don't want to require users to specify the correct signed or
> unsigned operator classes when creating a GIN index. Maybe we need to
In brief, it wouldn't matter which operator class new v18 indexes use. The
documentation would focus on gin_trgm_ops and also say something like:
There's an additional operator class, gin_trgm_ops_unsigned. It behaves
exactly like gin_trgm_ops, but it uses a deprecated on-disk representation.
Use gin_trgm_ops in new indexes, but there's no disadvantage from continuing
to use gin_trgm_ops_unsigned. Before PostgreSQL 18, gin_trgm_ops used a
platform-dependent representation. pg_upgrade automatically uses
gin_trgm_ops_unsigned when upgrading from source data that used the
deprecated representation.
What concerns might users have, then? (Neither operator class would use plain
"char" in a context that affects on-disk state. They'll use "signed char" and
"unsigned char".)
> dynamically use the correct compare function for the same operator
> class depending on the char signedness. But is it possible to do it on
> the extension (e.g. pg_trgm) side?
No, I don't think the extension can do that cleanly. It would need to store
the signedness in the index somehow, and GIN doesn't call the opclass at a
time facilitating that. That would need help from the core server.
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2024-08-31 03:58:33 | not null constraints, again |
Previous Message | Thomas Munro | 2024-08-31 03:10:28 | Re: 039_end_of_wal: error in "xl_tot_len zero" test |