From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Hubert Lubaczewski <depesz(at)depesz(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Excessive number of replication slots for 12->14 logical replication |
Date: | 2022-09-09 21:48:54 |
Message-ID: | CAD21AoAw0Oofi4kiDpJBOwpYyBBBkJj=sLUOn4Gd2GjUAKG-fw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Tue, Aug 30, 2022 at 3:44 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Aug 26, 2022 at 7:04 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > Thanks for the testing. I'll push this sometime early next week (by
> > Tuesday) unless Sawada-San or someone else has any comments on it.
> >
>
> Pushed.
Tom reported buildfarm failures[1] and I've investigated the cause and
concluded this commit is relevant.
In process_syncing_tables_for_sync(), we have the following code:
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
MyLogicalRepWorker->relstate_lsn);
ReplicationOriginNameForTablesync(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
originname,
sizeof(originname));
replorigin_session_reset();
replorigin_session_origin = InvalidRepOriginId;
replorigin_session_origin_lsn = InvalidXLogRecPtr;
replorigin_session_origin_timestamp = 0;
/*
* We expect that origin must be present. The concurrent operations
* that remove origin like a refresh for the subscription take an
* access exclusive lock on pg_subscription which prevent the previou
* operation to update the rel state to SUBREL_STATE_SYNCDONE to
* succeed.
*/
replorigin_drop_by_name(originname, false, false);
/*
* End streaming so that LogRepWorkerWalRcvConn can be used to drop
* the slot.
*/
walrcv_endstreaming(LogRepWorkerWalRcvConn, &tli);
/*
* Cleanup the tablesync slot.
*
* This has to be done after the data changes because otherwise if
* there is an error while doing the database operations we won't be
* able to rollback dropped slot.
*/
ReplicationSlotNameForTablesync(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
syncslotname,
sizeof(syncslotname));
If the table sync worker errored at walrcv_endstreaming(), we assumed
that both dropping the replication origin and updating relstate are
rolled back, which however was wrong. Indeed, the replication origin
is not dropped but the in-memory state is reset. Therefore, after the
tablesync worker restarts, it starts logical replication with starting
point 0/0. Consequently, it ends up applying the transaction that has
already been applied.
Regards,
[1] https://www.postgresql.org/message-id/115136.1662733870%40sss.pgh.pa.us
--
Masahiko Sawada
From | Date | Subject | |
---|---|---|---|
Next Message | houzj.fnst@fujitsu.com | 2022-09-10 03:41:01 | RE: Excessive number of replication slots for 12->14 logical replication |
Previous Message | egashira.yusuke@fujitsu.com | 2022-09-09 12:22:33 | RE: BUG #17611: SJIS conversion rule about duplicated characters differ from Windows |