Bug report - pg_upgrade tool seems to have a race condition when trying to delete a pg_wal file

From: Waka Ranai <wakadotranai(at)gmail(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Bug report - pg_upgrade tool seems to have a race condition when trying to delete a pg_wal file
Date: 2024-05-22 16:14:31
Message-ID: CAP8Vo=9o0FE6gzqZJ3XdeGPNqi=eNV3cM_6v-thE640YcYoWog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,

I tested the pg_upgrade tool many times on different servers (always
Windows server 19, actual subversion may differ) when trying to upgrade an
existing database from Postgres 9.6 to Postgres 15 (I tried both the 15.4.2
and 15.7) and was almost all the time faced with this issue during the step
“Setting next transaction ID and epoch for new cluster”.

Here’s the version of one of the servers, on which it failed at least three
times :

[image: image.png]

The command I ran is "C:\Program Files\PostgreSQL\15\bin\pg_upgrade.exe" -d
"C:\Program Files\PostgreSQL\9.6\data" -D "C:\Program
Files\PostgreSQL\15\data" -b "C:\Program Files\PostgreSQL\9.6\bin" -B
"C:\Program Files\PostgreSQL\15\bin" -U postgres after having set
PGPASSWORD to the correct password.

The issue was either “pg_resetwal: error: could not delete file
"pg_wal/000000010000000000000001": Permission denied” or sometimes it was
saying that the file could not be found instead of Permission denied. When
I look in the directory while it is executing, I can see that the file is
there previously, and always removed after the pg_upgrade crashes. I tried
to inspect with Process Explorer what processes were using it, always
processes from postgres, only one after a fresh install of postgres 15, but
I saw that during the execution of pg_upgrade, sometimes two processes were
using it.

I suspect that there is some sort of race condition where one process sees
that the file exists, does something with it and deletes it, while another
process saw the file existing, but upon trying to delete it, it could not
find it anymore. I had a look in the code and I believe it happens in the
function KillExistingXLOG from line 973 of pg_resetwal.c (
https://github.com/postgres/postgres/blob/master/src/bin/pg_resetwal/pg_resetwal.c#L973)
though I cannot be entirely sure of the cause.

You can find the logs produced by the pg_upgrade tool attached, with the
verbose option.

Thanks in advance for the investigation and I hope to understand better the
problem and hopefully see a fix soon as it is complicating the deployment
of a major upgrade of our product,

Have a great day,

Thomas

Attachment Content-Type Size
pg_upgrade_output.d.zip application/x-zip-compressed 72.3 KB

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Robert Haas 2024-05-22 16:47:37 Re: BUG #18362: unaccent rules and Old Greek text
Previous Message Ugur Yilmaz 2024-05-22 15:38:32 Ynt: Ynt: Postgresql 16.3 installation error (setup file) on Windows 11