Re: An improvement of ProcessTwoPhaseBuffer logic

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: An improvement of ProcessTwoPhaseBuffer logic
Date: 2024-12-25 22:42:34
Message-ID: Z2yKWtB4iB8BdEiI@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 25, 2024 at 07:18:18PM +0300, Vitaly Davydov wrote:
> I tried your patch and it seems the server is started
> successfully. But I've found another problem in my synthetic test -
> it can not remove the file with the following message:
>
> LOG: database system was not properly shut down; automatic recovery in progress
> WARNING: removing future two-phase state file for transaction 1045224
> WARNING: could not remove file "pg_twophase/FFFFFFFF000FF2E8": No such file or directory
> LOG: redo starts at 0/1762978

That works properly up to v16~. I have slept over this problem and I
think that we'd better backpatch the fix for the first problem. Now,
I also want to add a test to provide coverage. As you say, that's
simple:
- Create an empty file with a future XID name
- Restart the server.
- Scan the logs for the WARNING where the future file is removed.

> The fill will never be removed automatically.
> I guess, it is because we incorrectly calculate the two-phase file
> name using TwoPhaseFilePath in RemoveTwoPhaseFile in this
> scenario. It can be fixed if to pass file path directly from
> RecoverPreparedTransactions or StandbyRecoverPreparedTransaction
> into ProcessTwoPhaseBuffer -> RemoveTwoPhaseFile.

Yeah, I've noticed that as well. We are dealing with a second bug
that affects 17~ caused by the switch to FullTransactionIds for 2PC
file names, which had been introduced in 5a1dfde8334b (pretty sure
that it is the culprit, did not bisect). Attempting to remove a
future file triggers an assertion failure:
#3 0x00007fed046324f0 in __GI_abort () at ./stdlib/abort.c:79
#4 0x0000563e68a9bd07 in ExceptionalCondition
(conditionName=0x563e68c66fc3 "epoch > 0", fileName=0x563e68c66b0b
"twophase.c", lineNumber=953) at assert.c:66
#5 0x0000563e67683060 in AdjustToFullTransactionId (xid=4095) at
twophase.c:953
#6 0x0000563e67683092 in TwoPhaseFilePath (path=0x7ffcb515b4a0
"h\307!\227>V", xid=4095) at twophase.c:963
#7 0x0000563e67686603 in RemoveTwoPhaseFile (xid=4095,
giveWarning=true) at twophase.c:1728
#8 0x0000563e67688989 in ProcessTwoPhaseBuffer (xid=4095,
prepare_start_lsn=0, fromdisk=true, setParent=false, setNextXid=false)
at twophase.c:2219
#9 0x0000563e676872a6 in restoreTwoPhaseData () at twophase.c:1924
#10 0x0000563e676b5c8b in StartupXLOG () at xlog.c:5642

So this means that we are dealing with two different bugs, and that we
need to fix the assertion failure of the second problem first down to
17, then fix the first problem on all stable branches with a test to
cover both the first and second problems.

> I did it in the proposed patch
> https://www.postgresql.org/message-id/cedbe-65e0c000-1-6db17700%40133269862
> (it is incomplete now).

I suspect that this can be still dangerous as-is while complicating
the code with more possible paths for the removal of the 2PC files,
because we may still build a file name from an XID, and that's what's
causing this issue..
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-12-25 22:48:49 Re: Fix typo in comment of compute_return_type()
Previous Message Michel Pelletier 2024-12-25 20:25:18 Re: Using Expanded Objects other than Arrays from plpgsql