Re: [HACKERS] make async slave to wait for lsn to be replayed

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Kartyshov Ivan <i(dot)kartyshov(at)postgrespro(dot)ru>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, hlinnaka(at)iki(dot)fi, alvherre(at)alvh(dot)no-ip(dot)org, pashkin(dot)elfe(at)gmail(dot)com, bharath(dot)rupireddyforpostgres(at)gmail(dot)com, euler(at)eulerto(dot)com, thomas(dot)munro(at)gmail(dot)com, peter(at)eisentraut(dot)org, amit(dot)kapila16(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, smithpb2250(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed
Date: 2024-07-15 01:24:19
Message-ID: CAPpHfdvwMn21n-zWecvr5gEzFy7rmszwbOTekjTE4taCe9frBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks to Kyotaro for the review. And thanks to Ivan for the patch
revision. I made another revision of the patch.

On Tue, Jul 9, 2024 at 12:51 PM Kartyshov Ivan
<i(dot)kartyshov(at)postgrespro(dot)ru> wrote:
> Thank you for your interest in the patch.
>
> On 2024-06-20 11:30, Kyotaro Horiguchi wrote:
> > Hi, I looked through the patch and have some comments.
> >
> >
> > ====== L68:
> > + <title>Recovery Procedures</title>
> >
> > It looks somewhat confusing and appears as if the section is intended
> > to explain how to perform recovery. Since this is the first built-in
> > procedure, I'm not sure how should this section be written. However,
> > the section immediately above is named "Recovery Control Functions",
> > so "Reocvery Synchronization Functions" would align better with the
> > naming of the upper section. (I don't believe we need to be so strcit
> > about the distinction between functions and procedures here.)
> >
> > It looks strange that the procedure signature includes the return type.
>
> Good point, change
> Recovery Procedures -> Recovery Synchronization Procedures

Thank you, looks good to me.

> > ====== L93:
> > + If <parameter>timeout</parameter> is not specified or zero,
> > this
> > + procedure returns once WAL is replayed upto
> > + <literal>target_lsn</literal>.
> > + If the <parameter>timeout</parameter> is specified (in
> > + milliseconds) and greater than zero, the procedure waits until
> > the
> > + server actually replays the WAL upto
> > <literal>target_lsn</literal> or
> > + until the given time has passed. On timeout, an error is
> > emitted.
> >
> > The first sentence should mention the main functionality. Following
> > precedents, it might be better to use something like "Waits until
> > recovery surpasses the specified LSN. If no timeout is specified or it
> > is set to zero, this procedure waits indefinitely for the LSN. If the
> > timeout is specified (in milliseconds) and is greater than zero, the
> > procedure waits until the LSN is reached or the specified time has
> > elapsed. On timeout, or if the server is promoted before the LSN is
> > reached, an error is emitted."
> >
> > The detailed explanation that follows the above seems somewhat too
> > verbose to me, as other functions don't have such detailed examples.
>
> Please offer your description. I think it would be better.

Kyotaro actually provided a paragraph in his message. I've integrated
it into the patch.

> > ====== L484
> > /*
> > + * Set latches for processes, whose waited LSNs are already replayed.
> > This
> > + * involves spinlocks. So, we shouldn't do this under a spinlock.
> > + */
> >
> > Here, I'm not quite sure what specifically spinlock (or mutex?) is
> > referring to. However, more importantly, shouldn't we explain that it
> > is okay not to use any locks at all, rather than mentioning that
> > spinlocks should not be used here? I found a similar case around
> > freelist.c:238, which is written as follows.
> >
> >> * Not acquiring ProcArrayLock here which is slightly icky. It's
> >> * actually fine because procLatch isn't ever freed, so we just can
> >> * potentially set the wrong process' (or no process') latch.
> >> */
> >> SetLatch(&ProcGlobal->allProcs[bgwprocno].procLatch);
>
> ???

I've revised the comment.

> > ===== L518
> > +void
> > +WaitForLSN(XLogRecPtr targetLSN, int64 timeout)
> >
> > This function is only called within the same module. I'm not sure if
> > we need to expose it. I we do, the name should probably be more
> > specific. I'm not quite sure if the division of functionality between
> > this function and its only caller function is appropriate. As a
> > possible refactoring, we could have WaitForLSN() just return the
> > result as [reached, timedout, promoted] and delegate prerequisition
> > checks and error reporting to the SQL function.

I think WaitForLSNReplay() is better name. And it's not clear what
API we could need. So I would prefer to keep it static for now.

> waitLSN -> waitLSNStates
> No, waitLSNStates is not the best name, because waitLSNState is a state,
> and waitLSN is not the array of waitLSNStates. We can think about
> another name, what you think?
>
> > ===== L524
> > + /* Shouldn't be called when shmem isn't initialized */
> > + Assert(waitLSN);
> >
> > Seeing this assertion, I feel that the name "waitLSN" is a bit
> > obscure. How about renaming it to "waitLSNStates"?

I agree that waitLSN is too generic. I've renamed it to waitLSNState.

> > ===== L527
> > + /* Should be only called by a backend */
> > + Assert(MyBackendType == B_BACKEND && MyProcNumber <= MaxBackends);
> >
> > This is somewhat excessive, causing a server crash when ending with an
> > error would suffice. By the way, if I execute "CALL
> > pg_wal_replay_wait('0/0')" on a logical wansender, the server crashes.
> > The condition doesn't seem appropriate.
>
> Can you give more information on your server crashes, so I could repeat
> them.

I've rechecked this. We don't need to assert the process to be a
backend given that MaxBackends include background workers too.
Replaced this just with assert that MyProcNumber is valid.

> > ===== L565
> > + if (timeout > 0)
> > + {
> > + delay_ms = (endtime - GetCurrentTimestamp()) / 1000;
> > + latch_events |= WL_TIMEOUT;
> > + if (delay_ms <= 0)
> > + break;
> >
> > "timeout" is immutable in the function. Therefore, we can calculate
> > "latch_events" before entering the loop. By the way, the name
> > 'latch_events' seems a bit off. Latch is a kind of event the function
> > can wait for. Therefore, something like wait_events might be more
> > appropriate.
>
> "wait_event" - it can't be, because in latch declaration, this events
> responsible for wake up and not for wait
> int WaitLatch(Latch *latch, int wakeEvents, long timeout, uint32
> wait_event_info)

I agree with change "latch_events" => "wake_events" by Ivan. I also
made change to calculate wake_events in advance as proposed by
Kyotaro.

> > ==== L578
> > + if (rc & WL_LATCH_SET)
> > + ResetLatch(MyLatch);
> >
> > I think we usually reset latches unconditionally after returning from
> > WaitLatch(), even when waiting for timeouts.
>
> No, it depends on you logic, when you have several wake_events and you
> want to choose what event ignited your latch.
> Check applyparallelworker.c:813

+1,
Not necessary to reset latch if it hasn't been set. A lot of places
where we do that conditionally.

> > ===== L798
> > + * A process number, same as the index of this item in
> > waitLSN->procInfos.
> > + * Stored for convenience.
> > + */
> > + int procnum;
> >
> > It is described as "(just) for convenience". However, it is referenced
> > by Startup to fetch the PGPROC entry for the waiter, which necessary
> > for Startup. That aside, why don't we hold (the pointer to) procLatch
> > instead of procnum? It makes things simpler and I believe it is our
> > standard practice.
>
> Can you deeper explane what you meen and give the example?

I wrote the comment that we store procnum for convenience meaning that
alternatively we could calculate it as the offset in the
waitLSN->procInfos array. But I like your idea to keep latch instead.
This should give more flexibility for future advancements.

> > ===== L809
> > + /* A flag indicating that this item is added to waitLSN->waitersHeap
> > */
> > + bool inHeap;
> >
> > The name "inHeap" seems too leteral and might be hard to understand in
> > most occurances. How about using "waiting" instead?
>
> No, inHeap leteral mean indeed inHeap. Check the comment.
> Please suggest a more suitable one.

+1,
We use this flag for consistency during operations with heap. I don't
see a reason to rename this.

> > ===== L940
> > +# Check that new data is visible after calling pg_wal_replay_wait()
> >
> > On the other hand, the comment for the check for this test states that
> >
> >> +# Make sure the current LSN on standby and is the same as primary's
> >> LSN +ok($output eq 30, "standby reached the same LSN as primary");
> >
> > I think the first comment and the second should be consistent.
>
> Thanks, I'll rephrase this comment

I had to revised this comment furthermore.

> >> Oh, I forgot some notes about 044_wal_replay_wait_injection_test.pl.
> >>
> >> 1. It's not clear why this test needs node_standby2 at all. It seems
> >> useless.
> >
> > I agree with you. What we would need is a second *waiter client*
> > connecting to the same stanby rather than a second standby. I feel
> > like having a test where the first waiter is removed while multiple
> > waiters are waiting, as well as a test where promotion occurs under
> > the same circumstances.
>
> Can you give more information about this cases step by step, and what
> means "remove" and "promotion".
>
> >> 2. The target LSN is set to pg_current_wal_insert_lsn() + 10000. This
> >> location seems to be unachievable in this test. So, it's not clear
> >> what race condition this test could potentially detect.
> >> 3. I think it would make sense to check for the race condition
> >> reported by Heikki. That is to insert the injection point at the
> >> beginning of WaitLSNSetLatches().
> >
> > I think the race condition you mentioned refers to the inconsistency
> > between the inHeap flag and the pairing heap caused by a race
> > condition between timeout and wakeup (or perhaps other combinations?
> > I'm not sure which version of the patch the mentioned race condition
> > refers to). However, I imagine it is difficult to reliably reproduce
> > this condition. In that regard, in the latest patch, the coherence
> > between the inHeap flag and the pairing heap is protected by LWLock,
> > so I believe we no longer need that test.
>
> No, Alexandre means that Heikki point on race condition just before
> LWLock. But injection point we can inject and stepin on backend, and
> WaitLSNSetLatches is used from Recovery process. But I have trouble
> to wakeup injection point on server.

My initial hope was to add elegant enough test that will check for
concurrency issues between startup process and lsn waiter process.
That test should check an issue reported by Heikki and hopefully more.
I've tried to revise 044_wal_replay_wait_injection_test.pl, but it
becomes cumbersome and finally unclear what it's intended to test.
This is why I finally decide to remote this.

Regarding test for multiple waiters, 043_wal_replay_wait.pl has some.

------
Regards,
Alexander Korotkov
Supabase

Attachment Content-Type Size
v21-0001-Implement-pg_wal_replay_wait-stored-procedure.patch application/octet-stream 36.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2024-07-15 01:36:37 Re: Check lateral references within PHVs for memoize cache keys
Previous Message Michael Paquier 2024-07-15 00:54:43 Re: Allow non-superuser to cancel superuser tasks.