RE: Conflict detection for update_deleted in logical replication

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Conflict detection for update_deleted in logical replication
Date: 2024-12-17 03:24:47
Message-ID: OS0PR01MB57169619BD3AF79CA2D2A59894042@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday, December 16, 2024 7:21 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:

Hi,

>
> On Wed, Dec 11, 2024 at 2:32 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > Attach the V16 patch set which addressed above comments.
> >
> > There is a new 0002 patch where I tried to dynamically adjust the interval for
> > advancing the transaction ID. Instead of always waiting for
> > wal_receiver_status_interval, we can start with a short interval and increase
> > it if there is no activity (no xid assigned on subscriber), but not beyond
> > wal_receiver_status_interval.
> >
> > The intention is to more effectively advance xid to avoid retaining too much
> > dead tuples. My colleague will soon share detailed performance data and
> > analysis related to this enhancement.
>
> I am starting to review the patches, and trying to understand the
> concept that how you are preventing vacuum to remove the dead tuple
> which might required by the concurrent remote update, so I was looking
> at the commit message which explains the idea quite clearly but I have
> one question

Thanks for the review!

>
> The process of advancing the non-removable transaction ID in the apply worker
> involves:
>
> == copied from commit message of 0001 start==
> 1) Call GetOldestActiveTransactionId() to take oldestRunningXid as the
> candidate xid.
> 2) Send a message to the walsender requesting the publisher status, which
> includes the latest WAL write position and information about transactions
> that are in the commit phase.
> 3) Wait for the status from the walsender. After receiving the first status, do
> not proceed if there are concurrent remote transactions that are still in the
> commit phase. These transactions might have been assigned an earlier commit
> timestamp but have not yet written the commit WAL record. Continue to
> request
> the publisher status until all these transactions have completed.
> 4) Advance the non-removable transaction ID if the current flush location has
> reached or surpassed the last received WAL position.
> == copied from commit message of 0001 start==
>
> So IIUC in step 2) we send the message and get the list of all the
> transactions which are in the commit phase? What do you exactly mean by a
> transaction which is in the commit phase?

I was referring to transactions calling RecordTransactionCommit() and have
entered the commit critical section. In the patch, we checked if the proc has
marked the new flag DELAY_CHKPT_IN_COMMIT in 'MyProc->delayChkptFlags'.

> Can I assume transactions which are currently running on the publisher?

I think it's a subset of the running transactions. We only get the transactions
in commit phase with the intention to avoid delays caused by waiting for
long-running transactions to complete, which can result in the long retention
of dead tuples.

We decided to wait for running(committing) transactions due to the WAL/LSN
inversion issue[1]. The original idea is to directly return the latest WAL
write position without checking running transactions. But since there is a gap
between when we acquire the commit_timestamp and the commit LSN, it's possible
the transactions might have been assigned an earlier commit timestamp but have
not yet written the commit WAL record.

> And in step 3) we wait for all the transactions to get committed which we saw
> running (or in the commit phase) and we anyway don't worry about the newly
> started transactions as they would not be problematic for us. And in step 4)
> we would wait for all the flush location to reach "last received WAL
> position", here my question is what exactly will be the "last received WAL
> position" I assume it would be the position somewhere after the position of
> the commit WAL of all the transaction we were interested on the publisher?

Yes, your understanding is correct. It's a position after the position of all
the interesting transactions. In the patch, we get the latest WAL write
position(GetXLogWriteRecPtr()) in walsender after all interesting transactions
have finished and reply it to apply worker.

> At high level the overall idea looks promising to me but wanted to put
> more thought on lower level details about what transactions exactly we
> are waiting for and what WAL LSN we are waiting to get flushed.

Yeah, that makes sense, thanks.

[1] https://www.postgresql.org/message-id/OS0PR01MB571628594B26B4CC2346F09294592%40OS0PR01MB5716.jpnprd01.prod.outlook.com>

Best Regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message wenhui qiu 2024-12-17 03:31:54 Re: Add 64-bit XIDs into PostgreSQL 15
Previous Message Thomas Munro 2024-12-17 03:11:37 Re: Regression tests fail on OpenBSD due to low semmns value