From: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
---|---|
To: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | RE: Conflict detection for update_deleted in logical replication |
Date: | 2025-02-05 11:07:48 |
Message-ID: | OS3PR01MB57189F6FAA2EC9D3594F5EC694F72@OS3PR01MB5718.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wednesday, February 5, 2025 3:45 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Thu, Jan 23, 2025 at 5:17 PM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com>
> wrote:
> >
> I was reviewing v26 patch set and have some comments so far I reviewed
> 0001 so most of the comments/question are from this patch.
>
> comments on v26-0001
Thanks for the comments !
>
> 1.
> + next_full_xid = ReadNextFullTransactionId(); epoch =
> + EpochFromFullTransactionId(next_full_xid);
> +
> + /*
> + * Adjust the epoch if the next transaction ID is less than the oldest
> + * running transaction ID. This handles the case where transaction ID
> + * wraparound has occurred.
> + */
> + if (oldest_running_xid > XidFromFullTransactionId(next_full_xid))
> + epoch--;
> +
> + full_xid = FullTransactionIdFromEpochAndXid(epoch,
> + oldest_running_xid);
>
> I think you can directly use the 'AdjustToFullTransactionId()'
> function here, maybe we can move that somewhere else and make that
> non-static function.
Thanks for the suggestion. I used the existing API
FullTransactionIdFromAllowableAt() in this version since it's also convenient
to use. We could expose AdjustToFullTransactionId as well, but I feel we could
do that separately.
>
>
> 2.
> + /*
> + * We expect the publisher and subscriber clocks to be in sync using
> + time
> + * sync service like NTP. Otherwise, we will advance this worker's
> + * oldest_nonremovable_xid prematurely, leading to the removal of rows
> + * required to detect update_delete conflict.
> + *
> + * XXX Consider waiting for the publisher's clock to catch up with the
> + * subscriber's before proceeding to the next phase.
> + */
> + if (TimestampDifferenceExceeds(data->reply_time,
> + data->candidate_xid_time, 0))
> + ereport(ERROR,
> + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("oldest_nonremovable_xid transaction ID may be advanced
> + prematurely"), errdetail("The clock on the publisher is behind that of
> + the subscriber."));
>
>
> I don't fully understand the purpose of this check. Based on the comments in
> RetainConflictInfoData, if I understand correctly, candidate_xid_time
> represents the time when the candidate is determined, and reply_time
> indicates the time of the reply from the publisher. Why do we expect these two
> timestamps to have zero difference to ensure clock synchronization?
The check is to prevent the case when the publisher's clock is behind of the
subscriber's. Normally, the reply_time should be greater than the
candidate_xid_time if clock is in sync, but if the publisher's clock is behind,
then reply_time could be earlier than the candidate_xid_time, in which case
this ERROR would be triggered.
>
> 3.
> + /*
> + * Use last_recv_time when applying changes in the loop; otherwise, get
> + * the latest timestamp.
> + */
> + now = data->last_recv_time ? data->last_recv_time :
> + GetCurrentTimestamp();
>
> Can you explain in the comment what's the logic behind using last_recv_time
> here? Why not just compare 'candidate_xid_time' vs current timestamp?
The intention is to avoid unnecessary system time retrieval when we already
have the last_recv_time that can be directly used here (since it's fetched just
before invoking this function). It may not save a lot but It looks efficient to
reuse the existing timestamp when available. I added the same in comments.
>
> 4.
> Comment of v26-0004 doesn't clearly explain that once retention stopped after
> reaching 'max_conflict_retention_duration' will it resume back?
I Added some in commit message and comments of 0004.
Here is the V27 patch set which addressed comment above.
Best Regards,
Hou zj
Attachment | Content-Type | Size |
---|---|---|
v27-0001-Maintain-the-oldest-non-removeable-tranasction-I.patch | application/octet-stream | 40.8 KB |
v27-0002-Maintain-the-replication-slot-in-logical-launche.patch | application/octet-stream | 21.5 KB |
v27-0003-Add-a-retain_conflict_info-option-to-subscriptio.patch | application/octet-stream | 80.0 KB |
v27-0004-add-a-max_conflict_retention_duration-subscripti.patch | application/octet-stream | 91.5 KB |
v27-0005-Re-create-the-replication-slot-if-the-conflict-r.patch | application/octet-stream | 12.0 KB |
v27-0006-Add-a-tap-test-to-verify-the-management-of-the-n.patch | application/octet-stream | 7.0 KB |
v27-0007-Support-the-conflict-detection-for-update_delete.patch | application/octet-stream | 25.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Álvaro Herrera | 2025-02-05 11:49:14 | Re: Make COPY format extendable: Extract COPY TO format implementations |
Previous Message | Bertrand Drouvot | 2025-02-05 11:03:25 | Re: Add isolation test template in injection_points for wait/wakeup/detach |