Re: race condition when writing pg_control

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: race condition when writing pg_control
Date: 2025-03-26 09:12:11
Message-ID: CALDaNm1NFJD2ea7t1F8u6ZF4xzuejNfmPknHfZgrgqcXjFuRMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 31 Jul 2024 at 05:25, Noah Misch <noah(at)leadboat(dot)com> wrote:
>
> On Mon, Jul 15, 2024 at 03:44:48PM +1200, Thomas Munro wrote:
> > On Fri, Jul 12, 2024 at 11:43 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > > On Sat, May 18, 2024 at 05:29:12PM +1200, Thomas Munro wrote:
> > > > On Fri, May 17, 2024 at 4:46 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > > > > The specific problem here is that LocalProcessControlFile() runs in
> > > > > every launched child for EXEC_BACKEND builds. Windows uses
> > > > > EXEC_BACKEND, and Windows' NTFS file system is one of the two file
> > > > > systems known to this list to have the concurrent read/write data
> > > > > mashing problem (the other being ext4).
> > >
> > > > First idea idea I've come up with to avoid all of that: pass a copy of
> > > > the "proto-controlfile", to coin a term for the one read early in
> > > > postmaster startup by LocalProcessControlFile(). As far as I know,
> > > > the only reason we need it is to suck some settings out of it that
> > > > don't change while a cluster is running (mostly can't change after
> > > > initdb, and checksums can only be {en,dis}abled while down). Right?
> > > > Children can just "import" that sucker instead of calling
> > > > LocalProcessControlFile() to figure out the size of WAL segments yada
> > > > yada, I think? Later they will attach to the real one in shared
> > > > memory for all future purposes, once normal interlocking is allowed.
> > >
> > > I like that strategy, particularly because it recreates what !EXEC_BACKEND
> > > backends inherit from the postmaster. It might prevent future bugs that would
> > > have been specific to EXEC_BACKEND.
> >
> > Thanks for looking! Yeah, that is a good way to put it.
>
> Oops, the way I put it turned out to be false. Postmaster has ControlFile
> pointing to shared memory before forking backends, so !EXEC_BACKEND children
> are born that way. In the postmaster, ControlFile->checkPointCopy->redo does
> change after each checkpoint.
>
> > The only other idea I can think of is that the Postmaster could take
> > all of the things that LocalProcessControlFile() wants to extract from
> > the file, and transfer them via that struct used for EXEC_BACKEND as
> > individual variables, instead of this new proto-controlfile copy. I
> > think it would be a bigger change with no obvious-to-me additional
> > benefit, so I didn't try it.
>
> Yeah, that would be more future-proof but a bigger change. One could argue
> for a yet-larger refactor so even the !EXEC_BACKEND case doesn't read those
> fields from ControlFile memory. Then we could get rid of ControlFile ever
> being set to something other than NULL or a shmem pointer. ControlFileData's
> mix of initdb-time fields, postmaster-start-time fields, and changes-anytime
> fields is inconvenient here.
>
> The unknown is the value of that future proofing. Much EXEC_BACKEND early
> startup code is shared with postmaster startup, which can assume it's the only
> process. I can't rule out a future bug where that shared code does a read
> that's harmless in postmaster startup but harmful when an EXEC_BACKEND child
> runs the same read. For a changes-anytime field, the code would already be
> subtly buggy in EXEC_BACKEND today, since it would be reading without an
> LWLock. For a postmaster-start-time field, things should be okay so long as
> we capture the proto ControlFileData after the last change to such fields.
> That invariant is not trivial to achieve, but it's not gravely hard either.
>
> A possible middle option would be to use the proto control file, but
> explicitly set its changes-anytime fields to bogus values.
>
> What's your preference?

@Thomas Munro , others - If anyone has suggestions on which approach
might work best, please share them to help move this discussion
forward.

Regards,
Vignesh

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2025-03-26 09:12:32 Re: Allow default \watch interval in psql to be configured
Previous Message Daniel Gustafsson 2025-03-26 09:10:57 Re: Possibly hard-to-read message