Quick Links

Re: Column Filtering in Logical Replication

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Subject:	Re: Column Filtering in Logical Replication
Date:	2022-04-13 09:45:41
Message-ID:	CAA4eK1J03HJC-6bUsdHnO8O6W89T-CJ1NaqA5HuxTYikwA-HOQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Apr 13, 2022 at 1:41 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> I've looked at this issue and had the same analysis. Also, I could
> reproduce this issue with the steps shared by Amit.
>
> As I mentioned in another thread[1], the fact that the tablesync
> worker doesn't check the return value from
> wait_for_worker_state_change() seems a bug to me. So my initial
> thought of the solution is that we can have the tablesync worker check
> the return value and exit if it's false. That way, the apply worker
> can restart and request to launch the tablesync worker again. What do
> you think?
>

I think that will fix this symptom but I am not sure if that would be
the best way to deal with this because we have a mechanism where the
sync worker can continue even if we don't do anything as a result of
wait_for_worker_state_change() provided apply worker restarts.

The other part of the puzzle is the below check in the code:
/*
* If we reached the sync worker limit per subscription, just exit
* silently as we might get here because of an otherwise harmless race
* condition.
*/
if (nsyncworkers >= max_sync_workers_per_subscription)

It is not clear to me why this check is there, if this wouldn't be
there, the user would have got either a WARNING to increase the
max_logical_replication_workers or the apply worker would have been
restarted. Do you have any idea about this?

Yet another option is that we ensure that before launching sync
workers (say in process_syncing_tables_for_apply->FetchTableStates,
when we have to start a new transaction) we again call
maybe_reread_subscription(), which should also fix this symptom. But
again, I am not sure why it should be compulsory to call
maybe_reread_subscription() in such a situation, there are no comments
which suggest it,

Now, the reason why it appeared recently in commit c91f71b9dc is that
I think we have increased the number of initial table syncs in that
test, and probably increasing
max_sync_workers_per_subscription/max_logical_replication_workers
should fix that test.

--
With Regards,
Amit Kapila.

In response to

Re: Column Filtering in Logical Replication at 2022-04-13 08:10:27 from Masahiko Sawada

Responses

Re: Column Filtering in Logical Replication at 2022-04-14 03:02:16 from Masahiko Sawada

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Kapila	2022-04-13 09:50:20	Re: Support logical replication of DDLs
Previous Message	Alvaro Herrera	2022-04-13 09:29:39	Re: pgbench translation