From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> |
Subject: | Re: Is Recovery actually paused? |
Date: | 2021-01-25 04:35:19 |
Message-ID: | CAFiTN-sv5ih7D7dZHbHD-eBL8WPphn4hp1z4BBAW26_+us572w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jan 25, 2021 at 8:42 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Sun, 24 Jan 2021 14:26:08 +0530, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote in
> > On Sun, Jan 24, 2021 at 12:16 PM Bharath Rupireddy
> > <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > >
> > > On Sun, Jan 24, 2021 at 7:17 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > On Sat, 23 Jan 2021 at 4:40 PM, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > > >>
> > > >> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > >> > Please find the patch for the same. I haven't added a test case for
> > > >> > this yet. I mean we can write a test case to pause the recovery and
> > > >> > get the status. But I am not sure that we can really write a reliable
> > > >> > test case for 'pause requested' and 'paused'.
> > > >>
> > > >> +1 to just show the recovery pause state in the output of
> > > >> pg_is_wal_replay_paused. But, should the function name
> > > >> "pg_is_wal_replay_paused" be something like
> > > >> "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
> > > >> in a function, I expect a boolean output. Others may have better
> > > >> thoughts.
> > > >>
> > > >> IIUC the above change, ensuring the recovery is paused after it's
> > > >> requested lies with the user. IMHO, the main problem we are trying to
> > > >> solve is not met
> > > >
> > > >
> > > > Basically earlier their was no way for the user yo know whether the recovery is actually paused or not because it was always returning true after pause requested. Now, we will return whether pause requested or actually paused. So > for tool designer who want to wait for recovery to get paused can have a loop and wait until the recovery state reaches to paused. That will give a better control.
> > >
> > > I get it and I agree to have that change. My point was whether we can
> > > have a new function pg_wal_replay_pause_and_wait that waits until
> > > recovery is actually paused ((along with pg_is_wal_replay_paused
> > > returning the actual state than a true/false) so that tool developers
> > > don't need to have the waiting code outside, if at all they care about
> > > it? Others may have better thoughts than me.
> >
> > I think the previous patch was based on that idea where we thought
> > that we can pass an argument to pg_is_wal_replay_paused which can
> > decide whether to wait or return without the wait. I think this
> > version looks better to me where we give the status instead of
> > waiting. I am not sure whether we want another version of
> > pg_wal_replay_pause which will wait for actually it to get paused. I
> > mean there is always a scope to include the functionality in the
> > database which can be achieved by the tool, but this patch was trying
> > to solve the problem that there was no way to know the status so I
> > think returning the correct status should be the scope of this.
>
> I understand that the requirement here is that no record is replayed
> after pg_wal_replay_pause() is returned, or pg_is_wal_replay_paused()
> returns true, and delays taken while recovery don't delay the state
> change. That requirements are really synchronous.
>
> On the other hand the machinery is designed to be asynchronous.
>
> > * Note that we intentionally don't take the info_lck spinlock
> > * here. We might therefore read a slightly stale value of
> > * the recoveryPause flag, but it can't be very stale (no
> > * worse than the last spinlock we did acquire). Since a
> > * pause request is a pretty asynchronous thing anyway,
> > * possibly responding to it one WAL record later than we
> > * otherwise would is a minor issue, so it doesn't seem worth
> > * adding another spinlock cycle to prevent that.
>
> As the result, this patch tries to introduce several new checkpoints
> to some delaying point so that that waits can find pause request in a
> timely manner. I think we had better use locking (or atomics) for the
> information instead of such scattered checkpoints if we expect that
> machinery to work in a such syhcnronous manner.
>
> That would make the tri-state state variable and many checkpoints
> unnecessary. Maybe.
I don't think the intention was so to make it synchronous, I think
the main intention was that pg_is_wal_replay_paused can return us the
correct state, in short user can know that whether any more wal will
be replayed after pg_is_wal_replay_paused returns true or some other
state. I agree that along with that we have also introduced some
extra checkpoint where the recovery process is waiting for WAL and
apply delay and from the pg_wal_replay_pause we had sent a signal to
wakeup the recovery process. So I am not sure is it worth adding the
lock/atomic variable to make this synchronous. Any other thoughts on
this?
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2021-01-25 04:58:01 | Re: About to add WAL write/fsync statistics to pg_stat_wal view |
Previous Message | Masahiro Ikeda | 2021-01-25 04:28:20 | Re: About to add WAL write/fsync statistics to pg_stat_wal view |