From: | 清浅 <drec(dot)wu(at)foxmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Fix orphaned 2pc file which may casue instance restart failed |
Date: | 2024-09-08 05:01:37 |
Message-ID: | tencent_A7F059B5136A359625C7B2E4A386B3C3F007@qq.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi all, I found that there is a race condition between two global transaction, which may cause instance
restart failed with error 'could not access status of transaction xxx","Could not open file ""pg_xact/xxx"": No such file or directory'.
The scenery to reproduce the problem is:
1. gxact1 is doing `FinishPreparedTransaction` and checkpoint
is issued, so gxact1 will generate a 2pc file.
2. then gxact1 was removed from `TwoPhaseState->prepXacts` and
its state memory was returned to freelist.
3. but just before gxact1 remove its 2pc file, gxact2 is issued,
gxact2 will reuse the same state memory of gxact1 and will
reset `gxact->ondisk` to false.
4. gxact1 continue and found that `gxact->ondisk` is false, it won't
remove its 2pc file. This file is orphaned.
If gxact1's local xid is not frozen, the startup process will remove
the orphaned 2pc file. However, if the xid's corresponding clog file is
truncated by `vacuum`, the startup process will raise error 'could not
access status of transaction xxx', due to it could not found the
transaction's status file in dir `pg_xact`.
The potential fix is attached.
Attachment | Content-Type | Size |
---|---|---|
0001-Fix-orphaned-2pc-file-which-may-casue-instance-resta.patch | application/octet-stream | 2.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Stepan Neretin | 2024-09-08 06:35:00 | Re: SPI_connect, SPI_connect_ext return type |
Previous Message | jian he | 2024-09-08 02:02:00 | Re: Statistics Import and Export |