Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

From: Noah Misch <noah(at)leadboat(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, "Guo, Adam" <adamguo(at)amazon(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>
Subject: Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Date: 2025-02-17 22:57:21
Message-ID: 20250217225721.44.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 17, 2025 at 05:11:41PM -0800, Masahiko Sawada wrote:
> Thank you for reviewing the patches. I agree with all the comments you
> made. I've addressed them and I've attached new version patches that
> now have some regression tests.

> Subject: [PATCH v3 4/5] pg_upgrade: Add --set-char-signedness to set the
> default char signedness of new cluster.
>
> This change adds a new option --set-char-signedness to pg_upgrade. It
> enables user to set arbitrary signedness during pg_upgrade. This helps
> cases where user who knew they copied the v17 source cluster from
> x86 (signedness=true) to ARM (signedness=falese) can pg_upgrade

s/falese/false/

> --- a/doc/src/sgml/ref/pgupgrade.sgml
> +++ b/doc/src/sgml/ref/pgupgrade.sgml
> @@ -276,6 +276,54 @@ PostgreSQL documentation
> </listitem>
> </varlistentry>
>
> + <varlistentry>
> + <term><option>--set-char-signedness=</option><replaceable>option</replaceable></term>

If upgrading from v18+, valid use cases for this option are rare. I think
using the option should raise an error for v18+ source. A user who knows what
they're doing can use pg_resetwal manually, before the upgrade. This page
shouldn't refer to the pg_resetwal flag specifically, because that need is so
rare. What do you think?

> + <listitem>
> + <para>
> + Manually set the default char signedness of new clusters. Possible values
> + are <literal>signed</literal> and <literal>unsigned</literal>.
> + </para>
> + <para>
> + In the C language, the default signedness of the <type>char</type> type
> + (when not explicitly specified) varies across platforms. For example,
> + <type>char</type> defaults to <type>signed char</type> on x86 CPUs but
> + to <type>unsigned char</type> on ARM CPUs. When data stored using the
> + <type>char</type> type is persisted to disk, such as in GIN indexes,
> + this platform-dependent behavior results in incorrect data comparisons
> + in two scenarios:
> + </para>
> + <itemizedlist>
> + <listitem>
> + <para>
> + When data is moved between platforms with different char signedness.
> + </para>
> + </listitem>
> + <listitem>
> + <para>
> + When data is replicated using streaming replication across different architectures.
> + </para>
> + </listitem>
> + </itemizedlist>

After your patch series, core PostgreSQL persistent data doesn't depend on
"char" signedness. Extensions should also start calling
GetDefaultCharSignedness() to become independent of "char" signedness. Hence,
let's remove the part starting at "When data stored using the char type" and
ending here. Future users don't need to understand why pre-v18 had a problem.
They mainly need to know how to set this flag.

If we wanted to say something about the breakage before v18, it might be, "If
a pre-v18 cluster has trgm indexes and ran on different platforms at different
times in the history of those indexes, REINDEX those indexes before or after
running pg_upgrade." I'd put that in the release notes, not here in the
pg_upgrade doc page.

> + <para>
> + Starting from <productname>PostgreSQL</productname> 18, database clusters
> + maintain their own default char signedness setting, which can be used as
> + a hint to ensure consistent behavior across platforms with different

Setting this wrong makes your trgm indexes return wrong answers, so it's not a
"hint". (I think "hint" implies minor consequences for getting it wrong.)

> + default char signedness. By default, <application>pg_upgrade</application>
> + preserves the char signedness setting when upgrading from an existing cluster.
> + However, when upgrading from <productname>PostgreSQL</productname> 17 or
> + earlier, <application>pg_upgrade</application> adopts the char signedness
> + of the platform on which it was built.
> + </para>
> + <para>
> + This option allows you to explicitly set the default char signedness for
> + the new cluster, overriding any inherited values. This is particularly useful
> + when you plan to migrate the upgraded cluster to a platform with different
> + char signedness (for example, when moving from an x86-based system to an
> + ARM-based system).

Let's refine the terminology around "plan to migrate". Here's an informal
description (not documentation-quality):

1. If you "plan to migrate" but have not yet done so, don't use this flag.
The default behavior is right. Upgrade on the old platform without using
this flag, and then migrate. This is the most-foolproof approach.

2. If you already migrated the cluster data files to a different platform just
before running pg_upgrade, use this flag. It's essential not to modify any
trgm indexes between migrating the data files and running pg_upgrade. Best
to make pg_upgrade the first action that starts the cluster on the new
platform, not starting the cluster manually. This is trickier than (1).

> --- a/src/bin/pg_upgrade/pg_upgrade.c
> +++ b/src/bin/pg_upgrade/pg_upgrade.c
> @@ -399,8 +399,14 @@ set_new_cluster_char_signedness(void)
> {
> bool new_char_signedness;
>
> - /* Inherit the source database's signedness */
> - new_char_signedness = old_cluster.controldata.default_char_signedness;
> + /*
> + * Use the specified char signedness if specifies. Otherwise we inherit

s/if specifies/if specified/

> --- a/contrib/pg_trgm/trgm.h
> +++ b/contrib/pg_trgm/trgm.h
> @@ -41,14 +41,12 @@
> typedef char trgm[3];
>
> #define CMPCHAR(a,b) ( ((a)==(b)) ? 0 : ( ((a)<(b)) ? -1 : 1 ) )
> -#define CMPPCHAR(a,b,i) CMPCHAR( *(((const char*)(a))+i), *(((const char*)(b))+i) )
> -#define CMPTRGM(a,b) ( CMPPCHAR(a,b,0) ? CMPPCHAR(a,b,0) : ( CMPPCHAR(a,b,1) ? CMPPCHAR(a,b,1) : CMPPCHAR(a,b,2) ) )

I recommend moving CMPCHAR into the C file along with the two related macros
that you're moving.

> --- a/contrib/pg_trgm/trgm_op.c
> +++ b/contrib/pg_trgm/trgm_op.c
> @@ -42,6 +42,9 @@ PG_FUNCTION_INFO_V1(strict_word_similarity_commutator_op);
> PG_FUNCTION_INFO_V1(strict_word_similarity_dist_op);
> PG_FUNCTION_INFO_V1(strict_word_similarity_dist_commutator_op);
>
> +static inline int CMPTRGM_CHOOSE(const void *a, const void *b);

All instances of the "inline" keyword in this patch series are on functions
called only via function pointers. Hence, those functions are never inlined.
I'd remove the keyword.

> --- a/src/bin/pg_upgrade/pg_upgrade.h
> +++ b/src/bin/pg_upgrade/pg_upgrade.h
> @@ -125,6 +125,12 @@ extern char *output_files[];
> */
> #define JSONB_FORMAT_CHANGE_CAT_VER 201409291
>
> +/*
> + * The control file was changed to have the default char signedness,
> + * commit XXXXX.
> + */
> +#define DEFAULT_CHAR_SIGNEDNESS_VAT_VER 202501161

s/VAT/CAT/

> +command_like(
> + [ 'pg_controldata', $old->data_dir ],
> + qr/Default char data signedness:\s+signed/,
> + 'default char signedness of old cluster is signed in control file');
> +command_like(
> + [ 'pg_controldata', $new->data_dir ],
> + qr/Default char data signedness:\s+signed/,
> + 'default char signedness is new cluster signed in control file');

s/is new cluster signed/of new cluster is signed/ for consistency with the
previous test name.

> +
> +# Set the old cluster's default char signedness to unsigned for test.
> +command_ok(
> + [
> + 'pg_resetwal', '--char-signedness', 'unsigned',
> + '-f', $old->data_dir
> + ],
> + "set old cluster's default char signeness to unsigned");

s/signeness/signedness/

> Subject: [PATCH v3 2/5] pg_resetwal: Add --char-signedness option to change
> the default char signedness.
>
> With the newly added option --char-signedness, pg_resetwal updates the
> default char signeness flag in the controlfile.

s/signeness/signedness/

> --- a/doc/src/sgml/ref/pg_resetwal.sgml
> +++ b/doc/src/sgml/ref/pg_resetwal.sgml
> @@ -171,6 +171,20 @@ PostgreSQL documentation
> </para>
>
> <variablelist>
> + <varlistentry>
> + <term><option>--char-signedness=<replaceable class="parameter">option</replaceable></option></term>
> + <listitem>
> + <para>
> + Manually set the default char signedness. Possible values are
> + <literal>signed</literal> and <literal>unsigned</literal>.
> + </para>
> + <para>
> + This option can also be used to change the default char signedness
> + of an existing database cluster.
> + </para>
> + </listitem>
> + </varlistentry>

Each other list entry discusses how to choose a safe value, so this should say
something like that. For users in doubt, nothing other than pg_upgrade should
use this flag. Theoretically, if you just ran pg_upgrade with the wrong value
and had not yet modified any trgm indexes, you could run this to fix the
problem. That's too rare of a situation and too dangerous to document, so the
documentation for this flag should discourage using the flag.

> --- a/src/bin/pg_resetwal/pg_resetwal.c
> +++ b/src/bin/pg_resetwal/pg_resetwal.c
> @@ -105,6 +106,7 @@ main(int argc, char *argv[])
> {"multixact-offset", required_argument, NULL, 'O'},
> {"oldest-transaction-id", required_argument, NULL, 'u'},
> {"next-transaction-id", required_argument, NULL, 'x'},
> + {"char-signedness", required_argument, NULL, 2},
> {"wal-segsize", required_argument, NULL, 1},

Put the ", 2}" entry after the ", 1}" entry. (We alphabetize in usage(), but
we append to the end in these C arrays.)

> @@ -1189,6 +1213,7 @@ usage(void)
> printf(_(" -u, --oldest-transaction-id=XID set oldest transaction ID\n"));
> printf(_(" -x, --next-transaction-id=XID set next transaction ID\n"));
> printf(_(" --wal-segsize=SIZE size of WAL segments, in megabytes\n"));
> + printf(_(" --char-signedness=OPTION set char signedness to \"signed\" or \"unsigned\"\n"));

This --help output alphabetizes by short option, with the one non-short option
at the end. So I'd also alphabetize among the non-short options, specifically
putting the new line before --wal-segsize.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2025-02-17 23:26:23 Re: TOAST versus toast
Previous Message Peter Geoghegan 2025-02-17 22:44:32 Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)