Re: CF Bot failure in wait_for_subscription_sync()

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CF Bot failure in wait_for_subscription_sync()
Date: 2022-10-18 13:04:33
Message-ID: CAA4eK1LRouxE3sv=u_u_9doDPzV9arMyJPNvF-7ynJZiQXg1tg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 18, 2022 at 2:57 PM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Tuesday, October 18, 2022 2:16 PM Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > I have seen 2 patches registered in CF failing on Linux - Debian Bullseye in
> > wait_for_subscription_sync(). It seems like the tables aren't being synced. I
> > have not done any further analysis. I'm not sure if this issue is being discussed
> > elsewhere.
> >
> > # Postmaster PID for node "twoways" is 50208 Waiting for all subscriptions in
> > "twoways" to synchronize data
> > [14:12:43.092](198.391s) # poll_query_until timed out executing this query:
> > # SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r',
> > 's'); # expecting this output:
> > # t
> > # last actual query output:
> > # f
> > # with stderr:
> > timed out waiting for subscriber to synchronize data at t/100_bugs.pl line 147.
> >
> > https://api.cirrus-ci.com/v1/artifact/task/6618623857917952/log/src/test/sub
> > scription/tmp_check/log/regress_log_100_bugs
> > https://cirrus-ci.com/task/6618623857917952
> > https://cirrus-ci.com/task/5764058174455808
>
> Thanks for reporting this. I am not sure about the root cause but just share
> some initial analysis here.
>
> This testcase waits for table sync to finish for both table "t" and table "t2".
> But from the log, I can only see the log[1] related to the table sync of table
> "t". So it seems that the table sync worker for table "t2" was never started
> due to some reason.
>

Yeah, the reason is not clear to me either. Let me state my
understanding of the issue. IIUC, the following test in 100_bugs.pl
has failed:
$node_twoways->safe_psql('d1', 'ALTER PUBLICATION testpub ADD TABLE t2');
$node_twoways->safe_psql('d2',
'ALTER SUBSCRIPTION testsub REFRESH PUBLICATION');
...
...
$node_twoways->wait_for_subscription_sync($node_twoways, 'testsub', 'd2');

After the REFRESH operation, the new table should have been added to
pg_subscription_rel. This is also visible from LOGS because one of
tables has synced and for table another worker is not started. Now,
ideally, after the new entry is created in pg_subscription_rel, the
apply worker should have got an invalidation message and refreshed the
'table_states_not_ready' list which should have let it start the new
table sync worker for table 't2' but that is not happening here.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-10-18 13:24:22 Re: thinko in basic_archive.c
Previous Message Amit Kapila 2022-10-18 12:54:54 Re: Perform streaming logical transactions by background workers and parallel apply