From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Robocopy might be not robust enough for never-ending testing on Windows |
Date: | 2024-09-14 13:00:00 |
Message-ID: | ae26fb03-30f6-790d-fbc4-240b30af79de@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello hackers,
While trying to reproduce inexplicable drongo failures (e. g., [1])
seemingly related to BackgroundPsql, I stumbled upon close, but not
the same issue. After many (6-8 thousands) iterations of the
015_stream.pl TAP test, psql failed to start with a STATUS_DLL_INIT_FAILED
error, and a corresponding Windows popup preventing a test exit (see ss-1
in the archive attached).
Upon reaching that state of the system, following test runs fail with one
or another error related to memory (not only with psql, but also with the
server processes):
testrun/subscription_13/015_stream/log/015_stream_publisher.log:2024-09-11 20:01:51.777 PDT [8812] LOG: server process
(PID 11532) was terminated by exception 0xC00000FD
testrun/subscription_14/015_stream/log/015_stream_subscriber.log:2024-09-11 20:01:19.806 PDT [2036] LOG: server process
(PID 10548) was terminated by exception 0xC00000FD
testrun/subscription_16/015_stream/log/015_stream_publisher.log:2024-09-11 19:59:41.513 PDT [9128] LOG: server process
(PID 14476) was terminated by exception 0xC0000409
testrun/subscription_19/015_stream/log/015_stream_publisher.log:2024-09-11 20:03:27.801 PDT [10156] LOG: server process
(PID 2236) was terminated by exception 0xC0000409
testrun/subscription_20/015_stream/log/015_stream_publisher.log:2024-09-11 19:59:41.359 PDT [10656] LOG: server process
(PID 14712) was terminated by exception 0xC000012D
testrun/subscription_3/015_stream/log/015_stream_publisher.log:2024-09-11 20:02:23.815 PDT [13704] LOG: server process
(PID 13992) was terminated by exception 0xC00000FD
testrun/subscription_9/015_stream/log/015_stream_publisher.log:2024-09-11 19:59:41.360 PDT [9760] LOG: server process
(PID 11608) was terminated by exception 0xC0000142
...
When tests fail, I see Commit Charge reaching 100% (see ss-2 in the
attachment), while Physical Memory isn't all used up. To get OS to a
working state, I had to reboot it — killing processes, logoff/logon didn't
help.
I reproduced this issue again, investigated it and found out that it is
caused by robocopy (used by PostgreSQL::Test::Cluster->init), which is
leaking kernel objects, namely "Event objects" within Non-Paged pool on
each run.
This can be seen with Kernel Pool Monitor, or just with this simple PS script:
for ($i = 1; $i -le 100; $i++)
{
echo "iteration $i"
rm -r c:\temp\target
robocopy.exe /E /NJH /NFL /NDL /NP c:\temp\initdb-template c:\temp\target
Get-WmiObject -Class Win32_PerfRawData_PerfOS_Memory | % PoolNonpagedBytes
}
It shows to me:
iteration 1
Total Copied Skipped Mismatch FAILED Extras
Dirs : 27 27 0 0 0 0
Files : 968 968 0 0 0 0
Bytes : 38.29 m 38.29 m 0 0 0 0
Times : 0:00:00 0:00:00 0:00:00 0:00:00
...
1226063872
...
iteration 100
Total Copied Skipped Mismatch FAILED Extras
Dirs : 27 27 0 0 0 0
Files : 968 968 0 0 0 0
Bytes : 38.29 m 38.29 m 0 0 0 0
Times : 0:00:00 0:00:00 0:00:00 0:00:00
...
1245220864
(That is, 0.1-0.2 MB leaks per one robocopy run.)
I observed this on Windows 10 (Version 10.0.19045.4780), with all updates
installed, but not on Windows Server 2016 (10.0.14393.0). Moreover, using
robocopy v14393 on Windows 10 doesn't affect the issue.
Perhaps this information can be helpful for someone who is running
buildfarm/CI tests on Windows animals...
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-09-11%2007%3A24%3A53
Best regards,
Alexander
Attachment | Content-Type | Size |
---|---|---|
ss.zip | application/zip | 188.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Wolfgang Walther | 2024-09-14 13:02:38 | Regression tests fail with tzdata 2024b |
Previous Message | Hunaid Sohail | 2024-09-14 12:28:39 | Re: Psql meta-command conninfo+ |