From: | Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com> |
Cc: | Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, "Wei Wang (Fujitsu)" <wangw(dot)fnst(at)fujitsu(dot)com>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Subject: | Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication |
Date: | 2023-08-02 09:42:07 |
Message-ID: | CAGPVpCTuwTwAh8V8EcaKyea+RTk32CWUVX5Der13jrgk8wB5_Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, 2 Ağu 2023 Çar, 12:01 tarihinde şunu
yazdı:
> I think we are getting the error (ERROR: could not find logical
> decoding starting point) because we wouldn't have waited for WAL to
> become available before reading it. It could happen due to the
> following code:
> WalSndWaitForWal()
> {
> ...
> if (streamingDoneReceiving && streamingDoneSending &&
> !pq_is_send_pending())
> break;
> ..
> }
>
> Now, it seems that in 0003 patch, instead of resetting flags
> streamingDoneSending, and streamingDoneReceiving before start
> replication, we should reset before create logical slots because we
> need to read the WAL during that time as well to find the consistent
> point.
>
Thanks for the suggestion Amit. I've been looking into this recently and
couldn't figure out the cause until now.
I quickly made the fix in 0003. Seems like it resolved the "could not find
logical decoding starting point" errors.
vignesh C <vignesh21(at)gmail(dot)com>, 1 Ağu 2023 Sal, 09:32 tarihinde şunu yazdı:
> I agree that "no copy in progress issue" issue has nothing to do with
> 0001 patch. This issue is present with the 0002 patch.
> In the case when the tablesync worker has to apply the transactions
> after the table is synced, the tablesync worker sends the feedback of
> writepos, applypos and flushpos which results in "No copy in progress"
> error as the stream has ended already. Fixed it by exiting the
> streaming loop if the tablesync worker is done with the
> synchronization. The attached 0004 patch has the changes for the same.
> The rest of v22 patches are the same patch that were posted by Melih
> in the earlier mail.
Thanks for the fix. I placed it into 0002 with a slight change as follows:
- send_feedback(last_received, false, false);
> + if (!MyLogicalRepWorker->relsync_completed)
> + send_feedback(last_received, false, false);
IMHO relsync_completed means simply the same with streaming_done, that's
why I wanted to check that flag instead of an additional goto statement.
Does it make sense to you as well?
Thanks,
--
Melih Mutlu
Microsoft
Attachment | Content-Type | Size |
---|---|---|
v23-0002-Reuse-Tablesync-Workers.patch | application/octet-stream | 10.3 KB |
v23-0001-Refactor-to-split-Apply-and-Tablesync-Workers.patch | application/octet-stream | 25.4 KB |
v23-0003-Reuse-connection-when-tablesync-workers-change-t.patch | application/octet-stream | 6.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrey Lepikhov | 2023-08-02 09:43:19 | Re: [PoC] Reducing planning time when tables have many partitions |
Previous Message | Masahiro Ikeda | 2023-08-02 09:34:15 | Re: Support to define custom wait events for extensions |