Re: Robocopy might be not robust enough for never-ending testing on Windows

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Robocopy might be not robust enough for never-ending testing on Windows
Date: 2024-09-16 06:00:00
Message-ID: 8b724988-ba94-25b4-8064-068b6c4b0520@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Thomas,

14.09.2024 23:32, Thomas Munro wrote:
> On Sun, Sep 15, 2024 at 1:00 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>> (That is, 0.1-0.2 MB leaks per one robocopy run.)
>>
>> I observed this on Windows 10 (Version 10.0.19045.4780), with all updates
>> installed, but not on Windows Server 2016 (10.0.14393.0). Moreover, using
>> robocopy v14393 on Windows 10 doesn't affect the issue.
> I don't understand Windows but that seems pretty weird to me, as it
> seems to imply that a driver or something fairly low level inside the
> kernel is leaking objects (at least by simple minded analogies to
> operating systems I understand better). Either that or robocop.exe
> has userspace stuff involving at least one thread still running
> somewhere after it's exited, but that seems unlikely as I guess you'd
> have noticed that...

Yes, I see no robocopy process left after the test, and I think userspace
threads would not survive logoff.

> Just a thought: I was surveying the block cloning landscape across
> OSes and filesystems while looking into clone-based CREATE DATABASE
> (CF #4886) and also while thinking about the new TAP test initdb
> template copy trick, is that robocopy.exe tries to use Windows' block
> cloning magic, just like cp on recent Linux and FreeBSD systems (at
> one point I was wondering if that was causing some funky extra flush
> stalls on some systems, I need to come back to that...). It probably
> doesn't actually work unless you have Windows 11 kernel with DevDrive
> enabled (from reading, no Windows here), but I guess it still probably
> uses the new system interfaces, probably something like CopyFileEx().
> Does it still leak if you use /nooffload or /noclone?

I tested the following (with the script above):
Windows 10 (Version 10.0.19045.4780):
robocopy.exe (10.0.19041.4717) /NOOFFLOAD
iteration 1
496611328
...
iteration 1000
609701888

That is, it leaks

/NOCLONE is not supported by that robocopy version:
ERROR : Invalid Parameter #1 : "/NOCLONE"

Then, Windows 11 (Version 10.0.22000.613), robocopy 10.0.22000.469:
iteration 1
141217792
...
iteration 996
151670784
...
iteration 997
152817664
...
iteration 1000
151674880

That is, it doesn't leak.

robocopy.exe /NOOFFLOAD
iteration 1
152666112
...
iteration 1000
153341952

No leak.

/NOCLONE is not supported by that robocopy version:

Then I updated that Windows 11 to Version 10.0.22000.2538 (with KB5031358),
robocopy 10.0.22000.1516:
iteration 1
122753024
...
iteration 1000
244674560

It does leak.

robocopy /NOOFFLOAD
iteration 1
167522304
...
iteration 1000
283484160

It leaks as well.

Finally, I've installed newest Windows 11 Version 10.0.22631.4169, with
robocopy 10.0.22621.3672:
Non-paged pool increased from 133 to 380 MB after 1000 robocopy runs.

robocopy /OFFLOAD leaks too.

/NOCLONE is not supported by that robocopy version:

So this leak looks like a recent and still existing defect.

(Sorry for a delay, fighting with OS updates/installation took me a while.)

Best regards,
Alexander

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-09-16 06:26:37 Re: Support LIKE with nondeterministic collations
Previous Message Amit Kapila 2024-09-16 05:43:24 Re: Allow logical failover slots to wait on synchronous replication