Re: Problem while setting the fpw with SIGHUP

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: hlinnaka(at)iki(dot)fi, amit(dot)kapila16(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Problem while setting the fpw with SIGHUP
Date: 2018-04-12 05:07:53
Message-ID: 20180412050753.GA19289@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 12, 2018 at 10:34:30AM +0900, Kyotaro HORIGUCHI wrote:
> Checkpointer never calls CreateCheckPoint while
> RecoveryInProgress() == true. In other words, checkpointer is not
> an updator of shared FPW at the time StartupXLOG calls
> CreateCheckPoint for fallback_promote.

I have been able to spend a couple of hours on your patch, wrapping my
mind on your stuff. So what I had in mind was something like this type
of scenario:
1) The startup process requires a restart point.
2) The checkpointer receives the request, and blocks before reading
RecoveryInProgress().
3) A fallback_promote is triggered, making the startup process call
CreateCheckpoint().
4) Startup process finishes checkpoint, updates Insert->fullPageWrites.
5) Checkpoint reads RecoveryInProgress to false, moves on with its
checkpoint.

> The comment may be somewhat confusing that it is written
> there. The point is that checkpointer and StartupXLOG are
> mutually excluded on updating shared FPW by
> SharedRecoveryInProgress flag.

Indeed. I can see that it is the main key point of the patch.

> | * If full_page_writes has been turned off, issue XLOG_FPW_CHANGE before
> | * the flag actually takes effect. Checkpointer never calls this function
> | * before StartupXLOG() turns off SharedRecoveryInProgress so there's no
> | * window where checkpointer and startup processes - the only updators of
> | * the flag - can update shared FPW simultaneously. Thus no lock is
> | * required here. Both shared and local fullPageWrites do not change
> | * before the next reading below.

Yeah, this reduces the confusion.

(The latest patch is a mix of two patches.)

+ The default is <literal>on</literal>. The change of the parmeter takes
+ effect at the next checkpoint time.
s/parmeter/parameter/

By the way, I would vote for keeping track in WAL of full_page_writes
switched from off to on. This is not used in the backend, but that's
still useful for debugging end-user issues.

Actually, I was wondering why a spin lock is not taken in
RecoveryInProgress when reading SharedRecoveryInProgress, but that's
from 1a3d1044 which added a memory barrier instead as guarantee...
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2018-04-12 06:11:20 Re: Creation of wiki page for open items of v11
Previous Message David Rowley 2018-04-12 04:40:12 Re: [HACKERS] Runtime Partition Pruning