Fix a bug in DecodeAbort() and improve input data check on subscriber.

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Fix a bug in DecodeAbort() and improve input data check on subscriber.
Date: 2021-12-07 00:36:19
Message-ID: CAD21AoBqhUqgDZUhUVnnwKRubPDNJ6m6fJDPgok3E5cWJLL+pA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

While updating the patch I recently posted[1] to make pg_waldump
report replication origin ID, LSN, and timestamp, I found a bug that
replication origin timestamp is not set in ROLLBACK PREPARED case.
Commit 8bdb1332eb5 (CC'ed Amit) added an argument to
ReorderBufferFinishPrepared() but in ROLLBACK PREPARED case, the
caller specified it at the wrong position:

@@ -730,6 +730,7 @@ DecodeCommit(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf,
if (two_phase)
{
ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr,
buf->endptr,
+
SnapBuildInitialConsistentPoint(ctx->snapshot_builder),
commit_time, origin_id, origin_lsn,
parsed->twophase_gid, true);
}
@@ -868,6 +869,7 @@ DecodeAbort(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf,
{
ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr,
buf->endptr,
abort_time, origin_id, origin_lsn,
+ InvalidXLogRecPtr,
parsed->twophase_gid, false);
}

This affects the value of rollback_data.rollback_time on the
subscriber, resulting in setting the wrong timestamp to both the
replication origin timestamp and the error callback argument on the
subscriber. I've attached the patch to fix it.

Besides, I think we can improve checking input data on subscribers.
This bug was not detected by compilers but it could have been detected
if we checked the input data. Looking at logicalrep_read_xxx functions
in proto.c, there are some inconsistencies; we check the value of
prepare_data->xid in logicalrep_read_prepare_common() but we don't in
both logicalrep_read_commit_prepared() and
logicalrep_read_rollback_prepared(), and we don't check anything in
stream_start/stream_stop. Also, IIUC that since timestamps are always
set in prepare/commit prepared/rollback prepared cases we can check
them too. I've attached a PoC patch that introduces macros for
checking input data and adds some new checks. Since it could be
frequently called, I used unlikely() but probably we can also consider
replacing elog(ERROR) with assertions.

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoD2dJfgsdxk4_KciAZMZQoUiCvmV9sDpp8ZuKLtKCNXaA%40mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachment Content-Type Size
0001-Fix-a-bug-of-passing-incorrect-arguments-to-ReorderB.patch application/x-patch 1022 bytes
0002-Improve-input-data-check-of-logical-replication.patch application/x-patch 6.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2021-12-07 00:52:34 Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Previous Message Greg Nancarrow 2021-12-07 00:22:08 Re: Optionally automatically disable logical replication subscriptions on error