Re: Is Recovery actually paused?

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Is Recovery actually paused?
Date: 2021-02-09 04:17:58
Message-ID: CAFiTN-sYvrjkUCBsGU6040BNH56LJCpBgJJDMdUW8zF1uoLE+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 9, 2021 at 8:54 AM Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> wrote:
>
> On Tue, 09 Feb 2021 10:58:04 +0900 (JST)
> Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> > At Mon, 8 Feb 2021 17:05:52 +0530, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote in
> > > On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> wrote:
> > > >
> > > > On Mon, 08 Feb 2021 17:32:46 +0900 (JST)
> > > > Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > > >
> > > > > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> wrote in
> > > > > > > > > I think the right fix should be that the state should never go from
> > > > > > > > > ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause should take
> > > > > > > > > care of that.
> > > > > > > >
> > > > > > > > It makes sense to take care of this in pg_wal_replay_pause, but I wonder
> > > > > > > > it can not handle the case that a user resume and pause again while a sleep.
> > > > > > >
> > > > > > > Right, we will have to check and set in the loop. But we should not
> > > > > > > allow the state to go from paused to pause requested irrespective of
> > > > > > > this.
> > > > > >
> > > > > > I agree with you.
> > > > >
> > > > > Is there any actual harm if PAUSED returns to REQUESETED, assuming we
> > > > > immediately change the state to PAUSE always we see REQUESTED in the
> > > > > waiting loop, despite that we allow change the state from PAUSE to
> > > > > REQUESTED via NOT_PAUSED between two successive loop condition checks?
> > > >
> > > > If a user call pg_wal_replay_pause while recovery is paused, users can
> > > > observe 'pause requested' during a sleep alghough the time window is short.
> > > > It seems a bit odd that pg_wal_replay_pause changes the state like this
> > > > because This state meeans that recovery may not be 'paused'.
> > >
> > > Yeah, this appears wrong that after 'paused' we go back to 'pause
> > > requested'. the logical state transition should always be as below
> > >
> > > NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to
> > > request and then paused but there is nothing wrong with going to
> > > paused)
> > > PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or get paused)
> > > PAUSED -> NOT PAUSED (from PAUSED we should not go to the
> > > PAUSE_REQUESTED without going to NOT PAUSED)
> >
> > I didn't asked about the internal logical correctness, but asked about
> > *actual harm* revealed to users. I don't see any actual harm in the
> > "wrong" transition because:
>
> Actually, the incorrect state transition is not so harmful except that
> users can observe unnecessary state changes. However, I don't think any
> actual harm in prohibit the incorrect state transition. So, I think we
> can do it.
>
> > If we are going to introduce that complexity, I'd like to re-propose
> > to introduce interlocking between the recovery side and the
> > pause-requestor side instead of introducing the intermediate state,
> > which is the cause of the complexity.
> >
> > The attached PoC patch adds:
> >
> > - A solid checkpoint just before calling rm_redo. It doesn't add a
> > info_lck since the check is done in the existing lock section.
> >
> > - Interlocking between the above and SetRecoveryPause without adding a
> > shared variable.
> > (This is what I called "synchronous" before.)
>
> I think waiting in pg_wal_replay_pasue is a possible option, but this will
> also introduce other complexity to codes such as possibility of waiting for
> long or for ever. For example, waiting in SetRecoveryPause as in your POC
> patch appears to make recovery stuck in RecoveryRequiresIntParameter.
>

I agree with this, I think we previously discussed these approaches
where we can wait in pg_wal_replay_pasue() or
pg_is_wal_replay_pasued(). In fact, we had an older version where we
put the wait in pg_is_wal_replay_pasued(). But it appeared that doing
so will add extra complexity as well as instead of waiting in these
APIs the wait logic can be implemented in the application code which
is actually using these APIs and IMHO that will give better control to
the users.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-02-09 04:28:30 Re: Is Recovery actually paused?
Previous Message Dilip Kumar 2021-02-09 04:13:39 Re: Is Recovery actually paused?