Re: Robocopy might be not robust enough for never-ending testing on Windows

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Robocopy might be not robust enough for never-ending testing on Windows
Date: 2024-09-17 05:00:00
Message-ID: 71a57d38-1c4f-4c2d-15e4-520802283c56@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Thomas,

17.09.2024 04:01, Thomas Munro wrote:
> On Mon, Sep 16, 2024 at 6:00 PM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>> So this leak looks like a recent and still existing defect.
> From my cartoon-like understanding of Windows, I would guess that if
> event handles created by a program are leaked after it has exited, it
> would normally imply that they've been duplicated somewhere else that
> is still running (for example see the way that PostgreSQL's
> dsm_impl_pin_segment() calls DuplicateHandle() to give a copy to the
> postmaster, so that the memory segment continues to exist after the
> backend exits), and if it's that, you'd be able to see the handle
> count going up in the process monitor for some longer running process
> somewhere (as seen in this report from the Chrome hackers[1]). And if
> it's not that, then I would guess it would have to be a kernel bug
> because something outside userspace must be holding onto/leaking
> handles. But I don't really understand Windows beyond trying to debug
> PostgreSQL at a distance, so my guesses may be way off. If we wanted
> to try to find a Windows expert to look at a standalone repro, does
> your PS script work with *any* source directory, or is there something
> about the initdb template, in which case could you post it in a .zip
> file so that a non-PostgreSQL person could see the failure mode?
>
> [1] https://randomascii.wordpress.com/2021/07/25/finding-windows-handle-leaks-in-chromium-and-others/

That's very interesting reading. I'll try to research the issue that deep
later (though I guess this case is different — after logging off and
logging in as another user, I can't see any processes belonging to the
first one, while those "Event objects" in non-paged pool still occupy
memory), but finding a Windows expert who perhaps can look at the
robocopy's sources, would be good too (and more productive).

So, the repro we can show is:
rm -r c:\temp\source
mkdir c:\temp\source
for ($i = 1; $i -le 1000; $i++)
{
echo 1 > "c:\temp\source\$i"
}

for ($i = 1; $i -le 1000; $i++)
{
echo "iteration $i"
rm -r c:\temp\target
robocopy.exe /E /NJH /NFL /NDL /NP c:\temp\source c:\temp\target
Get-WmiObject -Class Win32_PerfRawData_PerfOS_Memory | % PoolNonpagedBytes
}

It produces for me (on Windows 10 [Version 10.0.19045.4780]):
iteration 1
...
216887296
...
iteration 1000

------------------------------------------------------------------------------

               Total    Copied   Skipped  Mismatch    FAILED Extras
    Dirs :         1         1         0         0         0 0
   Files :      1000      1000         0         0         0 0
   Bytes :     7.8 k     7.8 k         0         0         0 0
   Times :   0:00:00   0:00:00                       0:00:00 0:00:00

   Speed :               17660 Bytes/sec.
   Speed :               1.010 MegaBytes/min.
   Ended : Monday, September 16, 2024 8:58:09 PM

365080576

Just "touch c:\temp\source\$i" is not enough, files must be non-empty for
the leak to happen.

Best regards,
Alexander

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2024-09-17 05:16:16 Re: Add contrib/pg_logicalsnapinspect
Previous Message shveta malik 2024-09-17 04:54:19 Re: Add contrib/pg_logicalsnapinspect