Re: why is pg_upgrade's regression run so slow?

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: why is pg_upgrade's regression run so slow?
Date: 2024-07-28 01:19:38
Message-ID: CA+hUKGKO5RzE6Gj8a3ZcV7bANCwYoUZdyh4yWCXfgWOkT8ULGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jul 28, 2024 at 10:48 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Interesting. Maybe meson is over-aggressively trying to run these
> test suites in parallel?

Hypothesis: NTFS might not be as good at linking/unlinking lots of
files concurrently due to forced synchronous I/O, causing queuing?

That's what [1] was about for FreeBSD CI. I couldn't immediately see
how to make a RAM disks on Windows but at the time I had a hunch that
that OS was struggling in the same way.

Could some tuning help? Disable 8dot3name (a thing that creates a
ye-olde-MSDOS-compatible second directory entry for every file),
adjust disablelastaccess (something like noatime), disable USN journal
(a kind of secondary journal of all file system operations that is
used to drive the change notification API that we don't care about),
disable write cache flush so that any synchronous I/O operations don't
wait for that (at the risk of corruption on power loss, but maybe it's
OK on a device dedicated to temporary workspace)? This is just from
some quick googling, but perhaps someone who actually knows how to
drive Windows locally and use the performance monitoring tools could
tell us what it's actually waiting on...

I noticed there is a new thing called Dev Drive[2] on Windows 11,
which claims to be faster for developer workloads and there are
graphs[3] showing various projects' test suites going faster. It's
ReFS, a COW file system. From some quick googling, the CopyFile()
system does a fast clone, and that should affect the robocopy command
in Cluster.pm (note: Unixoid cp in there also uses COW cloning on at
least xfs, zfs, probably apfs too). So I would be interested to know
if that goes faster ... or slower. I'm also interested in how it
reacts to the POSIX-semantics mode[4]; that might affect whether we
can ever pull the trigger on that idea.

[1] https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BFXLcEg1dyTqJjDiNQ8pGom4KrJj4wF38C90thti9dVA%40mail.gmail.com
[2] https://learn.microsoft.com/en-us/windows/dev-drive/
[3] https://devblogs.microsoft.com/visualstudio/devdrive/
[4] https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message Joseph Koshakow 2024-07-28 01:10:23 Re: Fix overflow in pg_size_pretty