From: | Peter Smith <smithpb2250(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Ajin Cherian <itsajin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Single transaction in the tablesync worker? |
Date: | 2021-02-05 01:39:22 |
Message-ID: | CAHut+PuxESrD8Yd=U+_3=fLkpHGk-EvAHpMb_-H_ZAqk45jv7A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Feb 4, 2021 at 8:33 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
...
> Thanks. I have fixed one of the issues reported by me earlier [1]
> wherein the tablesync worker can repeatedly fail if after dropping the
> slot there is an error while updating the SYNCDONE state in the
> database. I have moved the drop of the slot just before commit of the
> transaction where we are marking the state as SYNCDONE. Additionally,
> I have removed unnecessary includes in tablesync.c, updated the docs
> for Alter Subscription, and updated the comments at various places in
> the patch. I have also updated the commit message this time.
>
Below are my feedback comments for V17 (nothing functional)
~~
1.
V27 Commit message:
For the initial table data synchronization in logical replication, we use
a single transaction to copy the entire table and then synchronizes the
position in the stream with the main apply worker.
Typo:
"synchronizes" -> "synchronize"
~~
2.
@@ -48,6 +48,23 @@ ALTER SUBSCRIPTION <replaceable
class="parameter">name</replaceable> RENAME TO <
(Currently, all subscription owners must be superusers, so the owner checks
will be bypassed in practice. But this might change in the future.)
</para>
+
+ <para>
+ When refreshing a publication we remove the relations that are no longer
+ part of the publication and we also remove the tablesync slots if there are
+ any. It is necessary to remove tablesync slots so that the resources
+ allocated for the subscription on the remote host are released. If due to
+ network breakdown or some other error, we are not able to remove the slots,
+ we give WARNING and the user needs to manually remove such slots later as
+ otherwise, they will continue to reserve WAL and might eventually cause
+ the disk to fill up. See also <xref
linkend="logical-replication-subscription-slot"/>.
+ </para>
I think the content is good, but the 1st-person wording seemed strange.
e.g.
"we are not able to remove the slots, we give WARNING and the user needs..."
Maybe it should be like:
"... PostgreSQL is unable to remove the slots, so a WARNING is
reported. The user needs... "
~~
3.
@@ -566,107 +569,197 @@ AlterSubscription_refresh(Subscription *sub,
bool copy_data)
...
+ * XXX If there is a network break down while dropping the
"network break down" -> "network breakdown"
~~
4.
@@ -872,7 +970,48 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
...
+ * XXX We could also instead try to drop the slot, last time we failed
+ * but for that, we might need to clean up the copy state as it might
+ * be in the middle of fetching the rows. Also, if there is a network
+ * break down then it wouldn't have succeeded so trying it next time
+ * seems like a better bet.
"network break down" -> "network breakdown"
~~
5.
@@ -269,26 +313,47 @@ invalidate_syncing_table_states(Datum arg, int
cacheid, uint32 hashvalue)
...
+
+ /*
+ * Cleanup the tablesync slot.
+ *
+ * This has to be done after updating the state because otherwise if
+ * there is an error while doing the database operations we won't be
+ * able to rollback dropped slot.
+ */
+ ReplicationSlotNameForTablesync(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ syncslotname);
+
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* missing_ok */);
+
Should this comment also describe why the missing_ok is false for this case?
----
Kind Regards,
Peter Smith.
Fujitsu Australia
From | Date | Subject | |
---|---|---|---|
Next Message | Ian Lawrence Barwick | 2021-02-05 01:40:12 | psql tab completion for CREATE DATABASE ... LOCALE |
Previous Message | Bharath Rupireddy | 2021-02-05 00:52:30 | Re: Is Recovery actually paused? |