Re: Fix orphaned 2pc file which may casue instance restart failed

From: ChengWen Wu <drec(dot)wu(at)foxmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix orphaned 2pc file which may casue instance restart failed
Date: 2024-10-09 07:51:59
Message-ID: tencent_CA843A8385CB3130B9ABC1E55023FC4E4D05@qq.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Michael,

Is there any progress about this problem? I could give more detailed information if you need.

Best wishes,
Chengwen Wu

------------------ Original ------------------
From: "Michael Paquier" <michael(at)paquier(dot)xyz&gt;;
Date:&nbsp;Wed, Sep 11, 2024 05:21 PM
To:&nbsp;"清浅"<drec(dot)wu(at)foxmail(dot)com&gt;;
Cc:&nbsp;"pgsql-hackers"<pgsql-hackers(at)lists(dot)postgresql(dot)org&gt;;
Subject:&nbsp;Re: Fix orphaned 2pc file which may casue instance restart failed

On Sun, Sep 08, 2024 at 01:01:37PM +0800, 清浅 wrote:
&gt; Hi all,&nbsp; I found that there is a race condition
&gt; between two global transaction, which may cause instance restart
&gt; failed with error 'could not access status of transaction
&gt; xxx","Could not open file ""pg_xact/xxx"": No such file or
&gt; directory'.
&gt;
&gt;
&gt; &nbsp; The scenery to reproduce the problem is:
&gt;&nbsp;&nbsp; &nbsp; 1. gxact1 is doing `FinishPreparedTransaction` and checkpoint
&gt;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; is issued, so gxact1 will generate a 2pc file.
&gt;&nbsp;&nbsp; &nbsp; 2. then gxact1 was removed from `TwoPhaseState-&amp;gt;prepXacts` and
&gt;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; its state memory was returned to freelist.
&gt;&nbsp;&nbsp; &nbsp; 3. but just before gxact1 remove its 2pc file, gxact2 is issued,
&gt;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; gxact2 will reuse the same state memory of gxact1 and will
&gt;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; reset `gxact-&amp;gt;ondisk` to false.
&gt;&nbsp;&nbsp; &nbsp; 4. gxact1 continue and found that `gxact-&amp;gt;ondisk` is false, it won't
&gt;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; remove its 2pc file. This file is orphaned.
&gt;
&gt; &nbsp; If gxact1's local xid is not frozen, the startup process will remove
&gt; the orphaned 2pc file. However, if the xid's corresponding clog file is
&gt; truncated by `vacuum`, the startup process will raise error 'could not
&gt; access status of transaction xxx', due to it could not found the
&gt; transaction's status file in dir `pg_xact`.

Hmm.&nbsp; I've not seen that in the field.&nbsp; Let me check that..
--
Michael

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2024-10-09 07:52:14 Re: pgindent fails with perl 5.40
Previous Message Tender Wang 2024-10-09 07:26:03 Remove an unnecessary check on semijoin_target_ok() on postgres_fdw.c