Re: Fix orphaned 2pc file which may casue instance restart failed

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: 清浅 <drec(dot)wu(at)foxmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix orphaned 2pc file which may casue instance restart failed
Date: 2024-09-11 09:21:37
Message-ID: ZuFhIUABouliP__O@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 08, 2024 at 01:01:37PM +0800, 清浅 wrote:
> Hi all, I found that there is a race condition
> between two global transaction, which may cause instance restart
> failed with error 'could not access status of transaction
> xxx","Could not open file ""pg_xact/xxx"": No such file or
> directory'.
>
>
> The scenery to reproduce the problem is:
> 1. gxact1 is doing `FinishPreparedTransaction` and checkpoint
> is issued, so gxact1 will generate a 2pc file.
> 2. then gxact1 was removed from `TwoPhaseState-&gt;prepXacts` and
> its state memory was returned to freelist.
> 3. but just before gxact1 remove its 2pc file, gxact2 is issued,
> gxact2 will reuse the same state memory of gxact1 and will
> reset `gxact-&gt;ondisk` to false.
> 4. gxact1 continue and found that `gxact-&gt;ondisk` is false, it won't
> remove its 2pc file. This file is orphaned.
>
> If gxact1's local xid is not frozen, the startup process will remove
> the orphaned 2pc file. However, if the xid's corresponding clog file is
> truncated by `vacuum`, the startup process will raise error 'could not
> access status of transaction xxx', due to it could not found the
> transaction's status file in dir `pg_xact`.

Hmm. I've not seen that in the field. Let me check that..
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-09-11 09:27:00 Re: not null constraints, again
Previous Message Amit Kapila 2024-09-11 09:10:21 Re: Conflict detection for update_deleted in logical replication