From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Peter Smith <smithpb2250(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Single transaction in the tablesync worker? |
Date: | 2021-01-21 10:17:23 |
Message-ID: | CAA4eK1LGxuB_RTfZ2HLJT76wv=FLV6UPqT+FWkiDg61rvQkkmQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jan 19, 2021 at 2:32 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Amit.
>
> PSA the v17 patch for the Tablesync Solution1.
>
Thanks for the updated patch. Below are few comments:
1. Why are we changing the scope of PG_TRY in DropSubscription()?
Also, it might be better to keep the replication slot drop part as it
is.
2.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a
FINISHEDCOPY state to
+ * indicate when the copy phase has completed, so if the worker crashes
+ * before reaching SYNCDONE the copy will not be re-attempted.
In the last line, shouldn't the state be FINISHEDCOPY instead of SYNCDONE?
3.
+void
+tablesync_cleanup_at_interrupt(void)
+{
+ bool drop_slot_needed;
+ char originname[NAMEDATALEN] = {0};
+ RepOriginId originid;
+ TimeLineID tli;
+ Oid subid = MySubscription->oid;
+ Oid relid = MyLogicalRepWorker->relid;
+
+ elog(DEBUG1,
+ "tablesync_cleanup_at_interrupt for relid = %d",
+ MyLogicalRepWorker->relid);
The function name and message makes it sound like that we drop slot
and origin at any interrupt. Isn't it better to name it as
tablesync_cleanup_at_shutdown()?
4.
+ drop_slot_needed =
+ wrconn != NULL &&
+ MyLogicalRepWorker->relstate != SUBREL_STATE_SYNCDONE &&
+ MyLogicalRepWorker->relstate != SUBREL_STATE_READY;
+
+ if (drop_slot_needed)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+ bool missing_ok = true; /* no ERROR if slot is missing. */
I think we can avoid using missing_ok and drop_slot_needed variables.
5. Can we drop the origin along with the slot in
process_syncing_tables_for_sync() instead of
process_syncing_tables_for_apply()? I think this is possible because
of the other changes you made in origin.c. Also, if possible, we can
try to use the same code to drop the slot and origin in
tablesync_cleanup_at_interrupt and process_syncing_tables_for_sync.
6.
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed/etc
+ * before it was able to finish normally.
+ */
There seems to be a typo (crashed/etc) in the above comment.
7.
+# check for occurrence of the expected error
+poll_output_until("replication slot \"$slotname\" already exists")
+ or die "no error stop for the pre-existing origin";
In this test, isn't it better to check for datasync state like below?
004_sync.pl has some other similar test.
my $started_query = "SELECT srsubstate = 'd' FROM pg_subscription_rel;";
$node_subscriber->poll_query_until('postgres', $started_query)
or die "Timed out while waiting for subscriber to start sync";
Is there a reason why we can't use the existing way to check for
failure in this case?
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2021-01-21 10:23:16 | Re: Printing LSN made easy |
Previous Message | Heikki Linnakangas | 2021-01-21 10:14:10 | Re: ResourceOwner refactoring |