From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Perform streaming logical transactions by background workers and parallel apply |
Date: | 2023-05-09 02:19:47 |
Message-ID: | CAD21AoDytm9ziQkGty81ugsHZmzNJ_DzYVNzFPVi-pSnP97k_w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, May 8, 2023 at 8:09 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, May 5, 2023 at 9:14 AM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Wednesday, May 3, 2023 3:17 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> >
> > Attach another patch to fix the problem that pa_shutdown will access invalid
> > MyLogicalRepWorker. I personally want to avoid introducing new static variable,
> > so I only reorder the callback registration in this version.
> >
> > When testing this, I notice a rare case that the leader is possible to receive
> > the worker termination message after the leader stops the parallel worker. This
> > is unnecessary and have a risk that the leader would try to access the detached
> > memory queue. This is more likely to happen and sometimes cause the failure in
> > regression tests after the registration reorder patch because the dsm is
> > detached earlier after applying the patch.
> >
>
> I think it is only possible for the leader apply can worker to try to
> receive the error message from an error queue after your 0002 patch.
> Because another place already detached from the queue before stopping
> the parallel apply workers. So, I combined both the patches and
> changed a few comments and a commit message. Let me know what you
> think of the attached.
I have one comment on the detaching error queue part:
+ /*
+ * Detach from the error_mq_handle for the parallel apply worker before
+ * stopping it. This prevents the leader apply worker from trying to
+ * receive the message from the error queue that might already
be detached
+ * by the parallel apply worker.
+ */
+ shm_mq_detach(winfo->error_mq_handle);
+ winfo->error_mq_handle = NULL;
In pa_detach_all_error_mq(), we try to detach error queues of all
workers in the pool. I think we should check if the queue is already
detached (i.e. is NULL) there. Otherwise, we will end up a SEGV if an
error happens after detaching the error queue and before removing the
worker from the pool.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Lakhin | 2023-05-09 03:00:00 | Re: Cleaning up array_in() |
Previous Message | Masahiko Sawada | 2023-05-09 02:19:20 | Re: Perform streaming logical transactions by background workers and parallel apply |