Re: race condition when writing pg_control

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: race condition when writing pg_control
Date: 2020-05-26 19:30:54
Message-ID: D162D812-D724-4B79-8D6C-CB1EFC9A796E@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5/22/20, 10:40 PM, "Michael Paquier" <michael(at)paquier(dot)xyz> wrote:
> On Sat, May 23, 2020 at 01:00:17AM +0900, Fujii Masao wrote:
>> Per my quick check, XLogReportParameters() seems to have the similar issue,
>> i.e., it updates the control file without taking ControlFileLock.
>> Maybe we should fix this at the same time?
>
> Yeah. It also checks the control file values, implying that we should
> have LW_SHARED taken at least at the beginning, but this lock cannot
> be upgraded we need LW_EXCLUSIVE the whole time. I am wondering if we
> should check with an assert if ControlFileLock is taken when going
> through UpdateControlFile(). We have one code path at the beginning
> of redo where we don't need a lock close to the backup_label file
> checks, but we could just pass down a boolean flag to the routine to
> handle that case. Another good thing in having an assert is that any
> new caller of UpdateControlFile() would need to think about the need
> of a lock.

While an assertion in UpdateControlFile() would not have helped us
catch the problem I initially reported, it does seem worthwhile to add
it. I have attached a patch that adds this assertion and also
attempts to fix XLogReportParameters(). Since there is only one place
where we feel it is safe to call UpdateControlFile() without a lock, I
just changed it to take the lock. I don't think this adds any sort of
significant contention risk, and IMO it is a bit cleaner than the
boolean flag.

For the XLogReportParameters() fix, I simply added an exclusive lock
acquisition for the portion that updates the values in shared memory
and calls UpdateControlFile(). IIUC the first part of this function
that accesses several ControlFile values should be safe, as none of
them can be updated after server start.

Nathan

Attachment Content-Type Size
v1-0001-Assert-that-ControlFileLock-is-held-exclusively-i.patch application/octet-stream 1.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-05-26 20:06:36 Re: some grammar refactoring
Previous Message Alvaro Herrera 2020-05-26 19:17:10 Re: hash join error improvement (old)