From: | Waka Ranai <wakadotranai(at)gmail(dot)com> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Bug report - pg_upgrade tool seems to have a race condition when trying to delete a pg_wal file |
Date: | 2024-05-22 16:14:31 |
Message-ID: | CAP8Vo=9o0FE6gzqZJ3XdeGPNqi=eNV3cM_6v-thE640YcYoWog@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hello,
I tested the pg_upgrade tool many times on different servers (always
Windows server 19, actual subversion may differ) when trying to upgrade an
existing database from Postgres 9.6 to Postgres 15 (I tried both the 15.4.2
and 15.7) and was almost all the time faced with this issue during the step
“Setting next transaction ID and epoch for new cluster”.
Here’s the version of one of the servers, on which it failed at least three
times :
[image: image.png]
The command I ran is "C:\Program Files\PostgreSQL\15\bin\pg_upgrade.exe" -d
"C:\Program Files\PostgreSQL\9.6\data" -D "C:\Program
Files\PostgreSQL\15\data" -b "C:\Program Files\PostgreSQL\9.6\bin" -B
"C:\Program Files\PostgreSQL\15\bin" -U postgres after having set
PGPASSWORD to the correct password.
The issue was either “pg_resetwal: error: could not delete file
"pg_wal/000000010000000000000001": Permission denied” or sometimes it was
saying that the file could not be found instead of Permission denied. When
I look in the directory while it is executing, I can see that the file is
there previously, and always removed after the pg_upgrade crashes. I tried
to inspect with Process Explorer what processes were using it, always
processes from postgres, only one after a fresh install of postgres 15, but
I saw that during the execution of pg_upgrade, sometimes two processes were
using it.
I suspect that there is some sort of race condition where one process sees
that the file exists, does something with it and deletes it, while another
process saw the file existing, but upon trying to delete it, it could not
find it anymore. I had a look in the code and I believe it happens in the
function KillExistingXLOG from line 973 of pg_resetwal.c (
https://github.com/postgres/postgres/blob/master/src/bin/pg_resetwal/pg_resetwal.c#L973)
though I cannot be entirely sure of the cause.
You can find the logs produced by the pg_upgrade tool attached, with the
verbose option.
Thanks in advance for the investigation and I hope to understand better the
problem and hopefully see a fix soon as it is complicating the deployment
of a major upgrade of our product,
Have a great day,
Thomas
Attachment | Content-Type | Size |
---|---|---|
pg_upgrade_output.d.zip | application/x-zip-compressed | 72.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2024-05-22 16:47:37 | Re: BUG #18362: unaccent rules and Old Greek text |
Previous Message | Ugur Yilmaz | 2024-05-22 15:38:32 | Ynt: Ynt: Postgresql 16.3 installation error (setup file) on Windows 11 |