Re: Excessive number of replication slots for 12->14 logical replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hubert Lubaczewski <depesz(at)depesz(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Excessive number of replication slots for 12->14 logical replication
Date: 2022-08-04 07:41:29
Message-ID: CAD21AoBGXi9nFsQTvgF-NfqcvoGOA9SrgSRwoEdGevEHvTm53g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Aug 1, 2022 at 3:46 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Mon, Aug 1, 2022 at 11:50 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > Here is my review comment for v4-0001.
> >
> > (Just a nitpick about comments)
> >
> > ======
> >
> > 1. src/backend/replication/logical/tablesync.c - process_syncing_tables_for_sync
> >
> > There are really just 2 main things that are being done for the cleanup here:
> > - Drop the origin tracking
> > - Drop the tablesync slot
> >
> > So, IMO the code will have more clarity by having one bigger comment
> > for each of those drops, instead of commenting every step separately.
> >
> > e.g.
> >
> > BEFORE
> > /*
> > * Cleanup the tablesync origin tracking if exists.
> > */
> > ReplicationOriginNameForTablesync(MyLogicalRepWorker->subid,
> > MyLogicalRepWorker->relid,
> > originname,
> > sizeof(originname));
> >
> > /*
> > * Reset the origin session before dropping.
> > *
> > * This is required to reset the ownership of the slot
> > * and allow the origin to be dropped.
> > */
> > replorigin_session_reset();
> >
> > replorigin_session_origin = InvalidRepOriginId;
> > replorigin_session_origin_lsn = InvalidXLogRecPtr;
> > replorigin_session_origin_timestamp = 0;
> >
> > /*
> > * There is a chance that the user is concurrently performing
> > * refresh for the subscription where we remove the table
> > * state and its origin and by this time the origin might be
> > * already removed. So passing missing_ok = true.
> > */
> > replorigin_drop_by_name(originname, true, false);
> >
> > /* Cleanup the tablesync slot. */
> > ReplicationSlotNameForTablesync(MyLogicalRepWorker->subid,
> > MyLogicalRepWorker->relid,
> > syncslotname,
> > sizeof(syncslotname));
> >
> > /*
> > * It is important to give an error if we are unable to drop the slot,
> > * otherwise, it won't be dropped till the corresponding subscription
> > * is dropped. So passing missing_ok = false.
> > */
> > ReplicationSlotDropAtPubNode(LogRepWorkerWalRcvConn, syncslotname, false);
> >
> >
> > SUGGESTION
> >
> > /*
> > * Cleanup the tablesync origin tracking.
> > *
> > * Resetting the origin session removes the ownership of the slot.
> > * This is needed to allow the origin to be dropped.
> > *
> > * Also, there is a chance that the user is concurrently performing
> > * refresh for the subscription where we remove the table
> > * state and its origin and by this time the origin might be
> > * already removed. So passing missing_ok = true.
> > */
> > ReplicationOriginNameForTablesync(MyLogicalRepWorker->subid,
> > MyLogicalRepWorker->relid,
> > originname,
> > sizeof(originname));
> > replorigin_session_reset();
> > replorigin_session_origin = InvalidRepOriginId;
> > replorigin_session_origin_lsn = InvalidXLogRecPtr;
> > replorigin_session_origin_timestamp = 0;
> >
> > replorigin_drop_by_name(originname, true, false);
> >
> > /*
> > * Cleanup the tablesync slot.
> > *
> > * It is important to give an error if we are unable to drop the slot,
> > * otherwise, it won't be dropped till the corresponding subscription
> > * is dropped. So passing missing_ok = false.
> > */
> > ReplicationSlotNameForTablesync(MyLogicalRepWorker->subid,
> > MyLogicalRepWorker->relid,
> > syncslotname,
> > sizeof(syncslotname));
> > ReplicationSlotDropAtPubNode(LogRepWorkerWalRcvConn, syncslotname, false);
> >
> Updated.

Thank you for working on this. I have a comment and a question:

* This has to be done after updating the state
because otherwise if
* there is an error while doing the database
operations we won't be
- * able to rollback dropped slot.
+ * able to rollback dropped slot or origin tracking.

I think we can actually roll back dropping the replication origin. So
the above comment is true for only replication slots.

---
+ replorigin_session_reset();
+ replorigin_session_origin = InvalidRepOriginId;
+ replorigin_session_origin_lsn = InvalidXLogRecPtr;
+ replorigin_session_origin_timestamp = 0;
+
+ replorigin_drop_by_name(originname, true, false);

With this change, the committing the ongoing transaction will be done
without replication origin. Is this okay? it's probably okay, but
since tablesync worker commits other changes with replication origin
I'm concerned a bit there might be a corner case.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-08-04 07:55:35 Re: [PATCH] BUG FIX: inconsistent page found in BRIN_REGULAR_PAGE
Previous Message Ravulapati, Gautham 2022-08-04 03:58:02 ERROR: unterminated dollar-quoted string at or near "$$"