Re: Bug report - pg_upgrade tool seems to have a race condition when trying to delete a pg_wal file

From: Waka Ranai <wakadotranai(at)gmail(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Bug report - pg_upgrade tool seems to have a race condition when trying to delete a pg_wal file
Date: 2024-05-31 12:01:36
Message-ID: CAP8Vo=9rYSNVDp+ohjTBFDk_H3JavdsUiLk+ptD7txR_uQAVeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello again, I tested after disabling the Microsoft antivirus entirely and
it worked the first time. I then uninstalled completely the new
Postrgres I'm upgrading to (Postgres 15, I made sure to delete the data
folder) and reinstalled it again to try the upgrade a second and a third
time, but both attempts failed, always on the same step, with the same
error message. I also tested on one of the other machines where the upgrade
never succeeded after disabling entirely the antivirus and still got the
error.
I agree that it must be some other process making readdir finding the file,
but releasing before unlink could work, but I could not manage to find
which one (apart from postgres processes) were using the wal file. I was
wondering if it wouldn't be a suitable solution/workaround to not fail when
trying to delete a file that is not there anymore ?
I will continue looking for what process could be reading the newly
modified/created file, but I'm a bit out of luck for now

Le mer. 29 mai 2024 à 09:51, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> a
écrit :

> On Tue, 2024-05-28 at 16:14 +0200, Waka Ranai wrote:
> > We tested on the aforementioned computer after adding an exception on
> the pg_wal
> > folder for the Microsoft default antivirus with
> > Add-MpPreference -ExclusionPath "C:\Program
> Files\PostgreSQL\15\data\pg_wal"
> > but we still faced the same issue, I included the pg_upgrade logs
>
> Thanks. I see
>
> command: "C:/Program Files/PostgreSQL/15/bin/pg_resetwal" -f -u 536
> "C:/Program Files/PostgreSQL/15/data" >> "C:/Program
> Files/PostgreSQL/15/data/pg_upgrade_output.d/202405>
> Write-ahead log reset
>
>
> command: "C:/Program Files/PostgreSQL/15/bin/pg_resetwal" -f -x 3466214
> "C:/Program Files/PostgreSQL/15/data" >> "C:/Program
> Files/PostgreSQL/15/data/pg_upgrade_output.d/20>
> pg_resetwal: error: could not delete file
> "pg_wal/000000010000000000000001": No such file or directory
>
> So it is failing in KillExistingXLOG(): readdir() finds the file,
> but by the time unlink() is executed, the file is already gone.
> The file in question is the WAL segment written by WriteEmptyXLOG() in the
> previous "pg_resetwal" execution.
>
> But the previous "pg_resetwal" has exited by the time the next one is
> started,
> so it should not be at fault.
>
> I found this similar thread:
> https://postgr.es/m/20090910094211.166C5753FB7%40cvs.postgresql.org
> The symptoms are the same.
>
> I wonder if something like commit 4e2d5efc6a45b1f9f96df42629f6d1c7740e657e
> would be useful here too. But it cannot be a PostgreSQL process that is
> holding the file open - the creating process has already exited, and no
> other PostgreSQL process would read the file.
>
> So the fact remains that there is something *outside of PostgreSQL* that
> opens newly created files. You say you disabled the virus scanner, but can
> you think of any other software on your system that would do that?
> Perhaps you can try disabling the virus scanner completely and check if
> that gets rid of the problem.
>
> Yours,
> Laurenz Albe
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2024-05-31 18:04:56 BUG #18489: CONSUMO CPU
Previous Message Japin Li 2024-05-31 09:50:51 Re: BUG #18467: postgres_fdw (deparser) ignores LimitOption