Re: Excessive number of replication slots for 12->14 logical replication

From: Ajin Cherian <itsajin(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hubert Lubaczewski <depesz(at)depesz(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Excessive number of replication slots for 12->14 logical replication
Date: 2022-07-29 11:35:57
Message-ID: CAFPTHDYgyNfhaR7UkFEYrMZgjrtat0Ry=hezNc6M_rwEuKi4MQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Jul 29, 2022 at 7:37 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are some comments for v3-0001:
>
> (all cosmetic / comments)
>
> ======
>
> 0. <apply>
>
> There were multiple whitespace warnings reported by git apply.
>
> [postgres(at)CentOS7-x64 oss_postgres_misc]$ git apply
> ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch
> ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch:36:
> indent with spaces.
> /*
> ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch:37:
> indent with spaces.
> * Cleanup the tablesync slot and the origin tracking if exists.
> ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch:38:
> indent with spaces.
> *
> ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch:39:
> indent with spaces.
> * This has to be done after updating the state because otherwise if
> ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch:40:
> indent with spaces.
> * there is an error while doing the database operations we won't be
> warning: squelched 3 whitespace errors
> warning: 8 lines add whitespace errors.
>
> ======
>

Fixed.

> 1. Commit message
>
> I think the Replication Origin tracking is maybe not strictly speaking
> a "slot", even though it does use the same GUC as true slots. With
> that in mind, perhaps the following text is more accurate.
>
> SUGGESTION
> The replication origin tracking uses the same GUC
> (max_replication_slots) as the tablesync slots for limiting resources.
>
> In the current (HEAD) code the replication origin tracking of the
> completed tablesync worker is dropped by the apply worker, but this
> means there can be a small lag between when the tablesync worker
> exited, and when its origin tracking is actually dropped.
>
> Meanwhile, new tablesync workers can be launched and will immediately
> try to acquire new slots and origin tracking. If this happens before
> the worker's origin tracking gets dropped then the available number of
> slots (max_replication_slots) can be exceeded, leading to the error as
> reported.
>
> To avoid this lag, the dropping of replicating origin tracking is
> moved to the tablesync worker where it exits.
>
> ======

Updated as suggested.

>
> 2. src/backend/replication/logical/tablesync.c - process_syncing_tables_for_sync
>
> @@ -315,13 +316,43 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
> */
> walrcv_endstreaming(LogRepWorkerWalRcvConn, &tli);
>
> + /*
> + * Cleanup the tablesync slot and the origin tracking if exists.
> + *
>
> Consider reversing that comment's first sentence so that it describes
> what it will do in the same order as the code logic.
>
> SUGGESTION
> Cleanup the origin tracking and tablesync slot.
>
> ~~~

Fixed.
>
> 3.
>
> + /*
> + * Reset the origin session before dropping.
> *
> - * This has to be done after updating the state because otherwise if
> - * there is an error while doing the database operations we won't be
> - * able to rollback dropped slot.
> + * This is required to reset the ownership of the slot
> + * and allow the slot to be dropped.
> */
>
> "slot to be dropped" -> "origin to be dropped" (maybe?)
>
> ~~~

Fixed

>
> 4. src/backend/replication/logical/tablesync.c -
> process_syncing_tables_for_apply
>
> @@ -451,27 +480,9 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
> started_tx = true;
> }
>
> - /*
> - * Remove the tablesync origin tracking if exists.
> - *
> - * The normal case origin drop is done here instead of in the
> - * process_syncing_tables_for_sync function because we don't
> - * allow to drop the origin till the process owning the origin
> - * is alive.
> - *
> - * There is a chance that the user is concurrently performing
> - * refresh for the subscription where we remove the table
> - * state and its origin and by this time the origin might be
> - * already removed. So passing missing_ok = true.
> - */
> - ReplicationOriginNameForTablesync(MyLogicalRepWorker->subid,
> - rstate->relid,
> - originname,
> - sizeof(originname));
> - replorigin_drop_by_name(originname, true, false);
>
> /*
>
>
> This change results in a double blank line remaining instead of just a
> single blank line.
>

Fixed.

regards,
Ajin Cherian
Fujitsu Australia

Attachment Content-Type Size
v4-0001-fix-excessive-replication-origin-slots-issue.patch application/octet-stream 4.5 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2022-07-29 13:55:53 Re: BUG #17561: Server crashes on executing row() with very long argument list
Previous Message Richard Guo 2022-07-29 10:58:15 Re: BUG #17561: Server crashes on executing row() with very long argument list