From: | vignesh C <vignesh21(at)gmail(dot)com> |
---|---|
To: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Race condition in FetchTableStates() breaks synchronization of subscription tables |
Date: | 2024-02-08 09:25:23 |
Message-ID: | CALDaNm0X8oUiW1CzniPZsDxqjP-VoYuEvb1h7NFXohKc1P5HEw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 6 Feb 2024 at 18:30, Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>
> 05.02.2024 13:13, vignesh C wrote:
> > Thanks for the steps for the issue, I was able to reproduce this issue
> > in my environment with the steps provided. The attached patch has a
> > proposed fix where the latch will not be set in case of the apply
> > worker exiting immediately after starting.
>
> It looks like the proposed fix doesn't help when ApplyLauncherWakeup()
> called by a backend executing CREATE SUBSCRIPTION command.
> That is, with the v4-0002 patch applied and pg_usleep(300000L); added
> just below
> if (!worker_in_use)
> return worker_in_use;
> I still observe the test 027_nosuperuser running for 3+ minutes:
> t/027_nosuperuser.pl .. ok
> All tests successful.
> Files=1, Tests=19, 187 wallclock secs ( 0.01 usr 0.00 sys + 4.82 cusr 4.47 csys = 9.30 CPU)
>
> IIUC, it's because a launcher wakeup call, sent by "CREATE SUBSCRIPTION
> regression_sub ...", gets missed when launcher waits for start of another
> worker (logical replication worker for subscription "admin_sub"), launched
> just before that command.
Yes, the wakeup call sent by the "CREATE SUBSCRIPTION" command was
getting missed in this case. The wakeup call can be sent during
subscription creation/modification and when the apply worker exits.
WaitForReplicationWorkerAttach should not reset the latch here as it
will end up delaying the apply worker to get started after 180 seconds
timeout(DEFAULT_NAPTIME_PER_CYCLE). The attached patch does not reset
the latch and lets ApplyLauncherMain to reset the latch and checks if
any new worker or missing worker needs to be started.
Regards,
Vignesh
Attachment | Content-Type | Size |
---|---|---|
v5-0002-Apply-worker-will-get-started-after-180-seconds-b.patch | text/x-patch | 2.2 KB |
v5-0001-Table-sync-missed-for-newly-added-tables-because-.patch | text/x-patch | 3.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Hayato Kuroda (Fujitsu) | 2024-02-08 09:33:08 | RE: Improve eviction algorithm in ReorderBuffer |
Previous Message | jian he | 2024-02-08 08:52:00 | Re: recently added jsonpath method change jsonb_path_query, jsonb_path_query_first immutability |