Re: Bug report - pg_upgrade tool seems to have a race condition when trying to delete a pg_wal file

From: Waka Ranai <wakadotranai(at)gmail(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Bug report - pg_upgrade tool seems to have a race condition when trying to delete a pg_wal file
Date: 2024-07-02 09:57:28
Message-ID: CAP8Vo=9ib4wxrYt3NdwwL8t8bPG4=LafoiZCSa+chZRzB=30TA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi again, I eventually found out that Cortex XDR was also installed on the
system, but even after uninstalling it, I'm still faced with the same
issue. I try to monitor the resources that might have a handle on the file,
but the only ones shown are from postgres (one from postgres.exe and one
from pg_resetwal). I did the monitoring with the bundled Resource Monitor
of Microsoft, do you have any recommendations for another monitoring tool
with automatic scanning maybe ?
How could I make sure that the issue is not due to an internal postgres
process ?
Did you consider not failing the upgrade if the file cannot be deleted ?
What would be the problems, if any, in that use case ?
Thanks in advance

Le ven. 31 mai 2024 à 14:01, Waka Ranai <wakadotranai(at)gmail(dot)com> a écrit :

> Hello again, I tested after disabling the Microsoft antivirus entirely and
> it worked the first time. I then uninstalled completely the new
> Postrgres I'm upgrading to (Postgres 15, I made sure to delete the data
> folder) and reinstalled it again to try the upgrade a second and a third
> time, but both attempts failed, always on the same step, with the same
> error message. I also tested on one of the other machines where the upgrade
> never succeeded after disabling entirely the antivirus and still got the
> error.
> I agree that it must be some other process making readdir finding the
> file, but releasing before unlink could work, but I could not manage to
> find which one (apart from postgres processes) were using the wal file. I
> was wondering if it wouldn't be a suitable solution/workaround to not fail
> when trying to delete a file that is not there anymore ?
> I will continue looking for what process could be reading the newly
> modified/created file, but I'm a bit out of luck for now
>
> Le mer. 29 mai 2024 à 09:51, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> a
> écrit :
>
>> On Tue, 2024-05-28 at 16:14 +0200, Waka Ranai wrote:
>> > We tested on the aforementioned computer after adding an exception on
>> the pg_wal
>> > folder for the Microsoft default antivirus with
>> > Add-MpPreference -ExclusionPath "C:\Program
>> Files\PostgreSQL\15\data\pg_wal"
>> > but we still faced the same issue, I included the pg_upgrade logs
>>
>> Thanks. I see
>>
>> command: "C:/Program Files/PostgreSQL/15/bin/pg_resetwal" -f -u 536
>> "C:/Program Files/PostgreSQL/15/data" >> "C:/Program
>> Files/PostgreSQL/15/data/pg_upgrade_output.d/202405>
>> Write-ahead log reset
>>
>>
>> command: "C:/Program Files/PostgreSQL/15/bin/pg_resetwal" -f -x 3466214
>> "C:/Program Files/PostgreSQL/15/data" >> "C:/Program
>> Files/PostgreSQL/15/data/pg_upgrade_output.d/20>
>> pg_resetwal: error: could not delete file
>> "pg_wal/000000010000000000000001": No such file or directory
>>
>> So it is failing in KillExistingXLOG(): readdir() finds the file,
>> but by the time unlink() is executed, the file is already gone.
>> The file in question is the WAL segment written by WriteEmptyXLOG() in the
>> previous "pg_resetwal" execution.
>>
>> But the previous "pg_resetwal" has exited by the time the next one is
>> started,
>> so it should not be at fault.
>>
>> I found this similar thread:
>> https://postgr.es/m/20090910094211.166C5753FB7%40cvs.postgresql.org
>> The symptoms are the same.
>>
>> I wonder if something like commit 4e2d5efc6a45b1f9f96df42629f6d1c7740e657e
>> would be useful here too. But it cannot be a PostgreSQL process that is
>> holding the file open - the creating process has already exited, and no
>> other PostgreSQL process would read the file.
>>
>> So the fact remains that there is something *outside of PostgreSQL* that
>> opens newly created files. You say you disabled the virus scanner, but
>> can
>> you think of any other software on your system that would do that?
>> Perhaps you can try disabling the virus scanner completely and check if
>> that gets rid of the problem.
>>
>> Yours,
>> Laurenz Albe
>>
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Richard Guo 2024-07-02 10:22:26 Re: BUG #18522: Wrong results with Merge Right Anti Join, inconsistent with Merge Anti Join
Previous Message Wing Kin Chong 2024-07-01 23:25:31 Re: using TEMP with the VACUUM function