Re: Improving the latch handling between logical replication launcher and worker processes.

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: hlinnaka(at)iki(dot)fi, kuroda(dot)hayato(at)fujitsu(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Improving the latch handling between logical replication launcher and worker processes.
Date: 2024-09-04 11:24:35
Message-ID: CALDaNm0OWSfOBXiCJ4mhPaZikg9adbSU0z6FkY2Wdx9AUUHFXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 4 Sept 2024 at 08:32, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Tue, 3 Sep 2024 09:10:07 +0530, vignesh C <vignesh21(at)gmail(dot)com> wrote in
> > The attached v2 version patch has the changes for the same.
>
> Sorry for jumping in at this point. I've just reviewed the latest
> patch (v2), and the frequent Own/Disown-Latch operations caught my
> attention. Additionally, handling multiple concurrently operating
> trigger sources with nested latch waits seems bug-prone, which I’d
> prefer to avoid from both a readability and safety perspective.
>
> With that in mind, I’d like to suggest an alternative approach. I may
> not be fully aware of all the previous discussions, so apologies if
> this idea has already been considered and dismissed.
>
> Currently, WaitForReplicationWorkerAttach() and
> logicalrep_worker_stop_internal() wait on a latch after verifying the
> desired state. This ensures that even if there are spurious or missed
> wakeups, they won't cause issues. In contrast, ApplyLauncherMain()
> enters a latch wait without checking the desired state
> first. Consequently, if another process sets the latch to wake up the
> main loop while the former two functions are waiting, that wakeup
> could be missed. If my understanding is correct, the problem lies in
> ApplyLauncherMain() not checking the expected state before beginning
> to wait on the latch. There is no issue with waiting if the state
> hasn't been satisfied yet.
>
> So, I propose that ApplyLauncherMain() should check the condition that
> triggers a main loop wakeup before calling WaitLatch(). To do this, we
> could add a flag in LogicalRepCtxStruct to signal that the main loop
> has immediate tasks to handle. ApplyLauncherWakeup() would set this
> flag before sending SIGUSR1. In turn, ApplyLauncherMain() would check
> this flag before calling WaitLatch() and skip the WaitLatch() call if
> the flag is set.
>
> I think this approach could solve the issue without adding
> complexity. What do you think?

I agree that this approach is more simple than the other approach. How
about something like the attached patch to handle the same.

Regards,
Vignesh

Attachment Content-Type Size
v3-0001-Resolve-a-problem-where-detection-of-concurrent-s.patch text/x-patch 2.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-09-04 11:34:05 Re: Commit Timestamp and LSN Inversion issue
Previous Message jian he 2024-09-04 11:18:52 Re: First draft of PG 17 release notes