RE: Slow catchup of 2PC (twophase) transactions on replica in LR

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Vitaly Davydov' <v(dot)davydov(at)postgrespro(dot)ru>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Date: 2024-07-08 07:04:40
Message-ID: OSBPR01MB2552AF02B039F00EF7791401F5DA2@OSBPR01MB2552.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Vitaly,

Thanks for giving comments! PSA new version patch.

> Thank you very much for the patch. In general, it seem to work well for me, but
> there seems to be a memory access problem in libpqrcv_alter_slot ->
> quote_identifier in case of NULL slot_name. It happens, if the two_phase option
> is altered on a subscription without slot. I think, a simple check for NULL may
> fix the problem. I guess, the same problem may be for failover option.

You are right. Regarding the failover option, it requires that slot_name is valid.
In case of two_phase, we must connect to the publisher only when altering "true"
to "false", slot_name must be there only at that time. Updated.

> Another possible problem is related to my use case. I haven't reproduced this
> case, just some thoughts. I guess, when two_phase is ON, the PREPARE statement
> may be truncated from the WAL at checkpoint, but COMMIT PREPARED is still kept
> in the WAL. On catchup, I would ask the master to send transactions from some
> restart LSN. I would like to get all such transactions competely, with theirs
> bodies, not only COMMIT PREPARED messages.

I don't think it is a real issue. WALs for prepared transactions will retain
until they are committed/aborted.
When the two_phase is on and transactions are PREPAREd, they will not be
cleaned up from the memory (See ReorderBufferProcessTXN()). Then, RUNNING_XACT
record leads to update the restart_lsn of the slot but it cannot be move forward
because ReorderBufferGetOldestTXN() returns the prepared transaction (See
SnapBuildProcessRunningXacts()). restart_decoding_lsn of each transaction, which
is a candidate of restart_lsn of the slot. is always behind the startpoint of
its txn.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

Attachment Content-Type Size
v16-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIP.patch application/octet-stream 27.8 KB
v16-0002-Alter-slot-option-two_phase-only-when-altering-t.patch application/octet-stream 12.0 KB
v16-0003-Abort-prepared-transactions-while-altering-two_p.patch application/octet-stream 11.4 KB
v16-0004-Add-force_alter-option-for-ALTER-SUBSCRIPTION-.-.patch application/octet-stream 56.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2024-07-08 07:07:16 Re: Logging which local address was connected to in log_line_prefix
Previous Message Michael Paquier 2024-07-08 06:49:34 Re: Pluggable cumulative statistics