Re: Conflict detection for update_deleted in logical replication

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Conflict detection for update_deleted in logical replication
Date: 2024-12-16 11:21:20
Message-ID: CAFiTN-sUsh86PQi=SgUKba8Z+2NBXFs9XWihyzKobZ8KDeYsBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 11, 2024 at 2:32 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> Attach the V16 patch set which addressed above comments.
>
> There is a new 0002 patch where I tried to dynamically adjust the interval for
> advancing the transaction ID. Instead of always waiting for
> wal_receiver_status_interval, we can start with a short interval and increase
> it if there is no activity (no xid assigned on subscriber), but not beyond
> wal_receiver_status_interval.
>
> The intention is to more effectively advance xid to avoid retaining too much
> dead tuples. My colleague will soon share detailed performance data and
> analysis related to this enhancement.

I am starting to review the patches, and trying to understand the
concept that how you are preventing vacuum to remove the dead tuple
which might required by the concurrent remote update, so I was looking
at the commit message which explains the idea quite clearly but I have
one question

The process of advancing the non-removable transaction ID in the apply worker
involves:

== copied from commit message of 0001 start==
1) Call GetOldestActiveTransactionId() to take oldestRunningXid as the
candidate xid.
2) Send a message to the walsender requesting the publisher status, which
includes the latest WAL write position and information about transactions
that are in the commit phase.
3) Wait for the status from the walsender. After receiving the first status, do
not proceed if there are concurrent remote transactions that are still in the
commit phase. These transactions might have been assigned an earlier commit
timestamp but have not yet written the commit WAL record. Continue to request
the publisher status until all these transactions have completed.
4) Advance the non-removable transaction ID if the current flush location has
reached or surpassed the last received WAL position.
== copied from commit message of 0001 start==

So IIUC in step 2) we send the message and get the list of all the
transactions which are in the commit phase? What do you exactly mean
by a transaction which is in the commit phase? Can I assume
transactions which are currently running on the publisher? And in
step 3) we wait for all the transactions to get committed which we saw
running (or in the commit phase) and we anyway don't worry about the
newly started transactions as they would not be problematic for us.
And in step 4) we would wait for all the flush location to reach "last
received WAL position", here my question is what exactly will be the
"last received WAL position" I assume it would be the position
somewhere after the position of the commit WAL of all the transaction
we were interested on the publisher?

At high level the overall idea looks promising to me but wanted to put
more thought on lower level details about what transactions exactly we
are waiting for and what WAL LSN we are waiting to get flushed.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nishant Sharma 2024-12-16 11:50:24 Re: on_error table, saving error info to a table
Previous Message Peter Eisentraut 2024-12-16 11:20:39 Re: remove pgrminclude?