From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, "Guo, Adam" <adamguo(at)amazon(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Jim Mlodgenski <jimmy76(at)gmail(dot)com> |
Subject: | Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation |
Date: | 2024-10-01 18:55:48 |
Message-ID: | CAD21AoDYBNG-EsgDy-sMiT6cinmrvsKe6N5BVOhizUb1k_ta2g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Sep 30, 2024 at 11:49 AM Noah Misch <noah(at)leadboat(dot)com> wrote:
>
> On Wed, Sep 25, 2024 at 03:46:27PM -0700, Masahiko Sawada wrote:
> > On Wed, Sep 18, 2024 at 6:46 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > > On Tue, Sep 17, 2024 at 09:43:41PM -0700, Masahiko Sawada wrote:
> > > > On Mon, Sep 16, 2024 at 9:24 AM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > > > > On Thu, Sep 12, 2024 at 03:42:48PM -0700, Masahiko Sawada wrote:
> > > > > > On Tue, Sep 10, 2024 at 3:05 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > > > > > > On Tue, Sep 10, 2024 at 05:56:47PM -0400, Tom Lane wrote:
> > > > > > > > Got it. So now I'm wondering if we need all the complexity of storing
> > > > > > > > stuff in the GIN metapages. Could we simply read the (primary's)
> > > > > > > > signedness out of pg_control and use that?
>
> > > > Thanks. I like the simplicity of this approach. If we agree with this
> > > > approach, I'd like to proceed with it.
> > >
> > > Works for me.
> > >
> > > > Regardless of what approach we take, I wanted to provide some
> > > > regression tests for these changes, but I could not come up with a
> > > > reasonable idea. It would be great if we could do something like
> > > > 027_stream_regress.pl on cross-architecture replication. But just
> > > > executing 027_stream_regress.pl on cross-architecture replication
> > > > could not be sufficient since we would like to confirm query results
> > > > as well. If we could have a reasonable tool or way, it would also help
> > > > find other similar bugs related architectural differences.
> > >
> > > Perhaps add a regress.c function that changes the control file flag and
> > > flushes the change to disk?
> >
> > I've tried this idea but found out that extensions can flush the
> > controlfile on the disk after flipping the char signedness value, they
> > cannot update the controlfile data on the shared memory. And, when the
> > server shuts down, the on-disk controlfile is updated with the
> > on-memory controlfile data, so the char signedness is reverted.
> >
> > We can add a function to set the char signedness of on-memory
> > controlfile data or add a new option to pg_resetwal to change the char
> > signedness on-disk controlfile data but they seem to be overkill to me
> > and I'm concerned about misusing these option and function.
>
> Before the project is over, pg_upgrade will have some way to set the control
> file signedness to the source cluster signedness. I think pg_upgrade should
> also take an option for overriding the source cluster signedness for source
> clusters v17 and older. That way, a user knowing they copied the v17 source
> cluster from x86 to ARM can pg_upgrade properly without the prerequisite of
> acquiring an x86 VM.
Good idea.
> Once that all exists, perhaps it will be clearer how
> best to change signedness for testing.
Agreed.
>
> > While this change does not encourage the use of 'char' without
> > explicitly specifying its signedness, it provides a hint to ensure
> > consistent behavior for indexes (e.g., GIN and GiST) and tables that
> > already store data sorted by the 'char' type on disk, especially in
> > cross-platform replication scenarios.
>
> Let's put that sort of information in the GetDefaultCharSignedness() header
> comment. New code should use explicit "signed char" or "unsigned char" when
> it matters for data file compatibility. GetDefaultCharSignedness() exists for
> code that deals with pre-v18 data files, where we accidentally let C
> implementation signedness affect persistent data.
Agreed, will fix.
>
> > @@ -4256,6 +4257,18 @@ WriteControlFile(void)
> >
> > ControlFile->float8ByVal = FLOAT8PASSBYVAL;
> >
> > + /*
> > + * The signedness of the char type is implementation-defined. For example
> > + * on x86 architecture CPUs, the char data type is typically treated as
> > + * signed by default whereas on aarch architecture CPUs it is typically
> > + * treated as unsigned by default.
> > + */
> > +#if CHAR_MIN != 0
> > + ControlFile->default_char_signedness = true;
> > +#else
> > + ControlFile->default_char_signedness = false;
> > +#endif
>
> This has initdb set the field to the current C implementation's signedness. I
> wonder if, instead, initdb should set signedness=true unconditionally. Then
> the only source of signedness=false will be pg_upgrade.
>
> Advantage: signedness=false will get rarer over time. If we had known about
> this problem during the last development cycle that forced initdb (v8.3), we
> would have made all clusters signed or all clusters unsigned. Making
> pg_upgrade the only source of signedness=false will cause the population of
> database clusters to converge toward that retrospective ideal.
I think it's a good idea. Being able to treat one case as a rarer one
rather than treating both cases equally may have various advantages in
the future, for example when developing or investigating a problem.
> Disadvantage: testing signedness=false will require more planning.
If testing signedness=false always requires pg_upgrade, there might be
some cumbersomeness. Inventing a testing-purpose-only tool (e.g., a
CLI program) that changes the signedness might make tests easier.
> What other merits should we consider as part of deciding?
Considering that the population of database cluster signedness will
converge to signedness=true in the future, we can consider using
-fsigned-char to prevent similar problems for the future. We need to
think about possible side-effects as well, though.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Lakhin | 2024-10-01 19:00:00 | Re: query_id, pg_stat_activity, extended query protocol |
Previous Message | Daniel Gustafsson | 2024-10-01 18:55:34 | Re: Changing the state of data checksums in a running cluster |