From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com> |
Subject: | Re: Column Filtering in Logical Replication |
Date: | 2022-04-14 03:39:31 |
Message-ID: | CAA4eK1LHevgWSsHHD2VHCMNnqqopdyY9fSZFjKPoTwA3vPy81g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Apr 14, 2022 at 8:32 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Apr 13, 2022 at 6:45 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Apr 13, 2022 at 1:41 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > I've looked at this issue and had the same analysis. Also, I could
> > > reproduce this issue with the steps shared by Amit.
> > >
> > > As I mentioned in another thread[1], the fact that the tablesync
> > > worker doesn't check the return value from
> > > wait_for_worker_state_change() seems a bug to me. So my initial
> > > thought of the solution is that we can have the tablesync worker check
> > > the return value and exit if it's false. That way, the apply worker
> > > can restart and request to launch the tablesync worker again. What do
> > > you think?
> > >
> >
> > I think that will fix this symptom but I am not sure if that would be
> > the best way to deal with this because we have a mechanism where the
> > sync worker can continue even if we don't do anything as a result of
> > wait_for_worker_state_change() provided apply worker restarts.
>
> I think we can think this is a separate issue. That is, if tablesync
> worker can start streaming changes even without waiting for the apply
> worker to set SUBREL_STATE_CATCHUP, do we really need the wait? I'm
> not sure it's really safe. If it's safe, the tablesync worker will no
> longer need to wait there.
>
As per my understanding, it is safe, whatever is streamed by tablesync
worker will be skipped later by apply worker. The wait here avoids
streaming the same data both by the apply worker and table sync worker
which I think is good even if it is not a must.
> >
> > The other part of the puzzle is the below check in the code:
> > /*
> > * If we reached the sync worker limit per subscription, just exit
> > * silently as we might get here because of an otherwise harmless race
> > * condition.
> > */
> > if (nsyncworkers >= max_sync_workers_per_subscription)
> >
> > It is not clear to me why this check is there, if this wouldn't be
> > there, the user would have got either a WARNING to increase the
> > max_logical_replication_workers or the apply worker would have been
> > restarted. Do you have any idea about this?
>
> Yeah, I'm also puzzled with this check. It seems that this function
> doesn't work well when the apply worker is not running and some
> tablesync workers are running. I initially thought that the apply
> worker calls to this function as many as tables that needs to be
> synced, but it checks the max_sync_workers_per_subscription limit
> before calling to logicalrep_worker_launch(). So I'm not really sure
> we need this check.
>
I just hope that the original author Petr J. responds to this point. I
have added him to this email. This will help us to find the best
solution for this problem.
Note: I'll be away for the remaining week, so will join the discussion
next week unless we reached the conclusion by that time.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | houzj.fnst@fujitsu.com | 2022-04-14 03:42:39 | RE: Perform streaming logical transactions by background workers and parallel apply |
Previous Message | Amit Kapila | 2022-04-14 03:25:18 | Re: PG DOCS - logical replication filtering |