From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net> |
Subject: | Re: Robocopy might be not robust enough for never-ending testing on Windows |
Date: | 2024-09-17 05:00:00 |
Message-ID: | 71a57d38-1c4f-4c2d-15e4-520802283c56@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello Thomas,
17.09.2024 04:01, Thomas Munro wrote:
> On Mon, Sep 16, 2024 at 6:00 PM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>> So this leak looks like a recent and still existing defect.
> From my cartoon-like understanding of Windows, I would guess that if
> event handles created by a program are leaked after it has exited, it
> would normally imply that they've been duplicated somewhere else that
> is still running (for example see the way that PostgreSQL's
> dsm_impl_pin_segment() calls DuplicateHandle() to give a copy to the
> postmaster, so that the memory segment continues to exist after the
> backend exits), and if it's that, you'd be able to see the handle
> count going up in the process monitor for some longer running process
> somewhere (as seen in this report from the Chrome hackers[1]). And if
> it's not that, then I would guess it would have to be a kernel bug
> because something outside userspace must be holding onto/leaking
> handles. But I don't really understand Windows beyond trying to debug
> PostgreSQL at a distance, so my guesses may be way off. If we wanted
> to try to find a Windows expert to look at a standalone repro, does
> your PS script work with *any* source directory, or is there something
> about the initdb template, in which case could you post it in a .zip
> file so that a non-PostgreSQL person could see the failure mode?
>
> [1] https://randomascii.wordpress.com/2021/07/25/finding-windows-handle-leaks-in-chromium-and-others/
That's very interesting reading. I'll try to research the issue that deep
later (though I guess this case is different — after logging off and
logging in as another user, I can't see any processes belonging to the
first one, while those "Event objects" in non-paged pool still occupy
memory), but finding a Windows expert who perhaps can look at the
robocopy's sources, would be good too (and more productive).
So, the repro we can show is:
rm -r c:\temp\source
mkdir c:\temp\source
for ($i = 1; $i -le 1000; $i++)
{
echo 1 > "c:\temp\source\$i"
}
for ($i = 1; $i -le 1000; $i++)
{
echo "iteration $i"
rm -r c:\temp\target
robocopy.exe /E /NJH /NFL /NDL /NP c:\temp\source c:\temp\target
Get-WmiObject -Class Win32_PerfRawData_PerfOS_Memory | % PoolNonpagedBytes
}
It produces for me (on Windows 10 [Version 10.0.19045.4780]):
iteration 1
...
216887296
...
iteration 1000
------------------------------------------------------------------------------
Total Copied Skipped Mismatch FAILED Extras
Dirs : 1 1 0 0 0 0
Files : 1000 1000 0 0 0 0
Bytes : 7.8 k 7.8 k 0 0 0 0
Times : 0:00:00 0:00:00 0:00:00 0:00:00
Speed : 17660 Bytes/sec.
Speed : 1.010 MegaBytes/min.
Ended : Monday, September 16, 2024 8:58:09 PM
365080576
Just "touch c:\temp\source\$i" is not enough, files must be non-empty for
the leak to happen.
Best regards,
Alexander
From | Date | Subject | |
---|---|---|---|
Next Message | David G. Johnston | 2024-09-17 05:16:16 | Re: Add contrib/pg_logicalsnapinspect |
Previous Message | shveta malik | 2024-09-17 04:54:19 | Re: Add contrib/pg_logicalsnapinspect |