Fix orphaned 2pc file which may casue instance restart failed

From: 清浅 <drec(dot)wu(at)foxmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Fix orphaned 2pc file which may casue instance restart failed
Date: 2024-09-08 05:01:37
Message-ID: tencent_A7F059B5136A359625C7B2E4A386B3C3F007@qq.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,&nbsp; &nbsp; &nbsp;I found that there is a race condition between two global transaction, which may cause instance
restart failed with error 'could not access status of transaction xxx","Could not open file ""pg_xact/xxx"": No such file or directory'.

&nbsp; &nbsp; &nbsp;The scenery to reproduce the problem is:
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1. gxact1 is doing `FinishPreparedTransaction` and checkpoint
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;is issued, so gxact1 will generate a 2pc file.
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2. then gxact1 was removed from `TwoPhaseState-&gt;prepXacts` and
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;its state memory was returned to freelist.
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3. but just before gxact1 remove its 2pc file, gxact2 is issued,
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;gxact2 will reuse the same state memory of gxact1 and will
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;reset `gxact-&gt;ondisk` to false.
&nbsp; &nbsp; &nbsp; &nbsp; 4. gxact1 continue and found that `gxact-&gt;ondisk` is false, it won't
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; remove its 2pc file. This file is orphaned.

&nbsp; &nbsp; If gxact1's local xid is not frozen, the startup process will remove
the orphaned 2pc file. However, if the xid's corresponding clog file is
truncated by `vacuum`, the startup process will raise error 'could not
access status of transaction xxx', due to it could not found the
transaction's status file in dir `pg_xact`.

&nbsp; &nbsp;The potential fix is attached.

Attachment Content-Type Size
0001-Fix-orphaned-2pc-file-which-may-casue-instance-resta.patch application/octet-stream 2.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stepan Neretin 2024-09-08 06:35:00 Re: SPI_connect, SPI_connect_ext return type
Previous Message jian he 2024-09-08 02:02:00 Re: Statistics Import and Export