From: | "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com> |
---|---|
To: | "'Oh, Mike'" <minsoo(at)amazon(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com> |
Subject: | RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns |
Date: | 2021-09-24 08:02:38 |
Message-ID: | OSBPR01MB488815590FF2EE07B612A394EDA49@OSBPR01MB4888.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi
On Tuesday, March 16, 2021 1:35 AM Oh, Mike <minsoo(at)amazon(dot)com> wrote:
> We noticed that the logical replication could fail when the
> Standby::RUNNING_XACT record is generated in the middle of a catalog
> modifying transaction and if the logical decoding has to restart from the
> RUNNING_XACT
> WAL entry.
...
> Proposed solution:
> If we’re decoding a catalog modifying commit record, then check whether
> it’s part of the RUNNING_XACT xid’s processed @ the restart_lsn. If so,
> then add its xid & subxacts in the committed txns list in the logical decoding
> snapshot.
>
> Please refer to the attachment for the proposed patch.
Let me share some review comments for the patch.
(1) last_running declaration
Isn't it better to add static for this variable,
because we don't use this in other places ?
@@ -85,6 +86,9 @@ static bool DecodeTXNNeedSkip(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf, Oid dbId,
RepOriginId origin_id);
+/* record previous restart_lsn running xacts */
+xl_running_xacts *last_running = NULL;
(2) DecodeStandbyOp's memory free
I'm not sure when
we pass this condition with already allocated last_running,
but do you need to free it's xid array here as well,
if last_running isn't null ?
Otherwise, we'll miss the chance after this.
+ /* record restart_lsn running xacts */
+ if (MyReplicationSlot && (buf->origptr == MyReplicationSlot->data.restart_lsn))
+ {
+ if (last_running)
+ free(last_running);
+
+ last_running = NULL;
(3) suggestion of small readability improvement
We calculate the same size twice here and DecodeCommit.
I suggest you declare a variable that stores the computed result of size,
which might shorten those codes.
+ /*
+ * xl_running_xacts contains a xids Flexible Array
+ * and its size is subxcnt + xcnt.
+ * Take that into account while allocating
+ * the memory for last_running.
+ */
+ last_running = (xl_running_xacts *) malloc(sizeof(xl_running_xacts)
+ + sizeof(TransactionId )
+ * (running->subxcnt + running->xcnt));
+ memcpy(last_running, running, sizeof(xl_running_xacts)
+ + (sizeof(TransactionId)
+ * (running->subxcnt + running->xcnt)));
Best Regards,
Takamichi Osumi
From | Date | Subject | |
---|---|---|---|
Next Message | Jaime Casanova | 2021-09-24 08:26:26 | Re: Use simplehash.h instead of dynahash in SMgr |
Previous Message | Dilip Kumar | 2021-09-24 07:50:35 | Re: Gather performance analysis |