From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Alexander Lakhin' <exclusion(at)gmail(dot)com>, "'andrew(at)dunslane(dot)net'" <andrew(at)dunslane(dot)net> |
Cc: | "'pgsql-hackers(at)lists(dot)postgresql(dot)org'" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | RE: Random pg_upgrade test failure on drongo |
Date: | 2023-11-30 10:00:21 |
Message-ID: | TY3PR01MB9889CD6B11182AEBDA95B798F582A@TY3PR01MB9889.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear Alexander, Andrew,
Thanks for your analysis!
> I see that behavior on:
> Windows 10 Version 1607 (OS Build 14393.0)
> Windows Server 2016 Version 1607 (OS Build 14393.0)
> Windows Server 2019 Version 1809 (OS Build 17763.1)
>
> But it's not reproduced on:
> Windows 10 Version 1809 (OS Build 17763.1) (triple-checked)
> Windows Server 2019 Version 1809 (OS Build 17763.592)
> Windows 10 Version 22H2 (OS Build 19045.3693)
> Windows 11 Version 21H2 (OS Build 22000.613)
>
> So it looks like the failure occurs depending not on Windows edition, but
> rather on it's build. For Windows Server 2019 the "good" build is
> somewhere between 17763.1 and 17763.592, but for Windows 10 it's between
> 14393.0 and 17763.1.
> (Maybe there was some change related to
> FILE_DISPOSITION_POSIX_SEMANTICS/
> FILE_DISPOSITION_ON_CLOSE implementation; I don't know where to find
> information about that change.)
>
> It's also interesting, what is full version/build of OS on drongo and
> fairywren.
Thanks for your interest for the issue. I have been tracking the failure but been not occurred.
Your analysis seems to solve BF failures, by updating OSes.
> I think that's because unlink() is performed asynchronously on those old
> Windows versions, but rename() is always synchronous.
OK. Actually I could not find descriptions about them, but your experiment showed facts.
> I've managed to reproduce that issue (or at least a situation that
> manifested similarly) with a sleep added in miscinit.c:
> ereport(IsPostmasterEnvironment ? LOG : NOTICE,
> (errmsg("database system is shut down")));
> + pg_usleep(500000L);
>
> With this change, I get the same warning as in [1] when running in
> parallel 10 tests 002_pg_upgrade with a minimal olddump (on iterations
> 33, 46, 8). And with my PoC patch applied, I could see the same warning
> as well (on iteration 6).
>
> I believe that's because rename() can't rename a directory containing an
> open file, just as unlink() can't remove it.
>
> In the light of the above, I think that the issue in question should be
> fixed in accordance with/as a supplement to [2].
OK, I understood that we need to fix more around here. For now, we should focus our points.
Your patch seems good, but it needs more sight from windows-friendly developers.
How do other think?
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
From | Date | Subject | |
---|---|---|---|
Next Message | John Naylor | 2023-11-30 10:05:23 | Re: [PGDOCS] Inconsistent linkends to "monitoring" views. |
Previous Message | Dilip Kumar | 2023-11-30 10:00:15 | Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock |