RE: Conflict detection for update_deleted in logical replication

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
Subject: RE: Conflict detection for update_deleted in logical replication
Date: 2024-10-30 08:33:44
Message-ID: TYCPR01MB5693972B28E24E102743FDDAF5542@TYCPR01MB5693.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Hou,

Thanks for updating the patch! Here are my comments.

01. CreateSubscription
```
+ if (opts.detectupdatedeleted && !track_commit_timestamp)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("detecting update_deleted conflicts requires \"%s\" to be enabled",
+ "track_commit_timestamp"));
```

I don't think this guard is sufficient. I found two cases:

* Creates a subscription with detect_update_deleted = false and track_commit_timestamp = true,
then alters detect_update_deleted to true.
* Creates a subscription with detect_update_deleted = true and track_commit_timestamp = true,
then update track_commit_timestamp to true and restart the instance.

Based on that, how about detecting the inconsistency on the apply worker? It check
the parameters and do error out when it starts or re-reads a catalog. If we want
to detect in SQL commands, this can do in parse_subscription_options().

02. AlterSubscription()
```
+ ApplyLauncherWakeupAtCommit();
```

The reason why launcher should wake up is different from other parts. Can we add comments
that it is needed to track/untrack the xmin?

03. build_index_column_bitmap()
```
+ for (int i = 0; i < indexinfo->ii_NumIndexAttrs; i++)
+ {
+ int keycol = indexinfo->ii_IndexAttrNumbers[i];
+
+ index_bitmap = bms_add_member(index_bitmap, keycol);
+ }
```

I feel we can assert the ii_IndexAttrNumbers is valid, because the passed index is a replica identity key.

04. LogicalRepApplyLoop()

Can we move the definition of "phase" to the maybe_advance_nonremovable_xid() and
make it static? The variable is used only by the function.

05. LogicalRepApplyLoop()
```
+ UpdateWorkerStats(last_received, timestamp, false);
```

The statistics seems not correct. last_received is not sent at "timestamp", it had
already been sent earlier. Do we really have to update here?

06. ErrorOnReservedSlotName()

I feel we should document that the slot name 'pg_conflict_detection' cannot be specified
unconditionally.

07. General

update_deleted can happen without DELETE commands. Should we rename the conflict
reason, like 'update_target_modified'?

E.g., there is a 2-way replication system and below transactions happen:

Node A:
T1: INSERT INTO t (id, value) VALUES (1,1); ts = 10.00
T2: UPDATE t SET id = 2 WHERE id = 1; ts = 10.02
Node B:
T3: UPDATE t SET value = 2 WHERE id = 1; ts = 10.01

Then, T3 comes to Node A after executing T2. T3 tries to find id = 1 but find a
dead tuple instead. In this case, 'update_delete' happens without the delete.

08. Others

Also, here is an analysis related with the truncation of commit timestamp. I worried the
case that commit timestamp might be removed so that the detection would not go well.
But it seemed that entries can be removed when it behinds GetOldestNonRemovableTransactionId(NULL),
i.e., horizons.shared_oldest_nonremovable. The value is affected by the replication
slots so that interesting commit_ts entries for us are not removed.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2024-10-30 08:41:03 Re: protocol-level wait-for-LSN
Previous Message Joel Jacobson 2024-10-30 08:14:41 Re: New "raw" COPY format