Re: how to speed up 002_pg_upgrade.pl and 025_stream_regress.pl under valgrind

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: how to speed up 002_pg_upgrade.pl and 025_stream_regress.pl under valgrind
Date: 2024-09-16 11:34:04
Message-ID: 5c1f4808-7221-4a81-8e7b-16801eb6ed97@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9/15/24 21:47, Tomas Vondra wrote:
> On 9/15/24 20:31, Tom Lane wrote:
>> Tomas Vondra <tomas(at)vondra(dot)me> writes:
>>> [ 002_pg_upgrade and 027_stream_regress are slow ]
>>
>>> I don't have a great idea how to speed up these tests, unfortunately.
>>> But one of the problems is that all the TAP tests run serially - one
>>> after each other. Could we instead run them in parallel? The tests setup
>>> their "private" clusters anyway, right?
>>
>> But there's parallelism within those two tests already, or I would
>> hope so at least. If you run them in parallel then you are probably
>> causing 40 backends instead of 20 to be running at once (plus 40
>> valgrind instances). Maybe you have a machine beefy enough to make
>> that useful, but I don't.
>>
>
> I did look into that for both tests, albeit not very thoroughly, and
> most of the time there were only 1-2 valgrind processes using CPU. The
> stream_regress seems more aggressive, but even for that the CPU spikes
> are short, and the machine could easily do something else in parallel.
>
> I'll try to do better analysis and some charts to visualize this ...

I see there's already a discussion about how to make these tests cheaper
by running only a subset of the regression tests, but here are two
charts showing how many processes and CPU usage for the two tests (under
valgrind). In both cases there are occasional spikes with >10 backends,
and high CPU usage, but most of the time it's only 1-2 processes, using
1-2 cores.

In fact, the two charts are almost exactly the same - which is somewhat
expected, considering the expensive part is running regression tests,
and that's the same for both.

But doesn't this also mean we might speed up check-world by reordering
the tests a bit? The low-usage parts happen because one of the tests in
a group takes much longer, so what if moved those slow tests into a
group on their own?

regards

--
Tomas Vondra

Attachment Content-Type Size
image/png 53.3 KB
image/png 47.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message Amit Kapila 2024-09-16 11:24:40 Re: Introduce XID age and inactive timeout based replication slot invalidation