Re: Adding CI to our tree

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Subject: Re: Adding CI to our tree
Date: 2022-02-13 21:42:13
Message-ID: 20220213214213.GS31460@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 12, 2022 at 02:26:25PM -0800, Andres Freund wrote:
> On 2022-02-12 16:06:40 -0600, Justin Pryzby wrote:
> > I had some success with that, but it doesn't seem to be significantly faster -
> > it looks a lot like the tests are not actually running in parallel.

Note that the total test time is close to the sum of the individual test times.
But I think that may be an artifact of how prove is showing/attributing times
to each test (which, if so, is misleading).

> Note that prove unfortunately serializes the test output to be in the order it
> started them, even when tests run concurrently. Extremely unhelpful, but ...

Are you sure ? When I run it locally, I see:
rm -fr src/test/recovery/tmp_check ; time PERL5LIB=`pwd`/src/test/perl TESTDIR=`pwd`/src/test/recovery PATH=`pwd`/tmp_install/usr/local/pgsql/bin:$PATH PG_REGRESS=`pwd`/src/test/regress/pg_regress REGRESS_SHLIB=`pwd`/src/test/regress/regress.so prove --time -j4 --ext '*.pl' `find src -name t`
...
[15:34:48] src/bin/scripts/t/101_vacuumdb_all.pl ....................... ok 104 ms ( 0.00 usr 0.00 sys + 2.35 cusr 0.47 csys = 2.82 CPU)
[15:34:49] src/bin/scripts/t/090_reindexdb.pl .......................... ok 8894 ms ( 0.06 usr 0.01 sys + 14.45 cusr 3.38 csys = 17.90 CPU)
[15:34:50] src/bin/pg_config/t/001_pg_config.pl ........................ ok 79 ms ( 0.00 usr 0.01 sys + 0.23 cusr 0.04 csys = 0.28 CPU)
[15:34:50] src/bin/pg_waldump/t/001_basic.pl ........................... ok 35 ms ( 0.00 usr 0.00 sys + 0.26 cusr 0.02 csys = 0.28 CPU)
[15:34:51] src/bin/pg_test_fsync/t/001_basic.pl ........................ ok 100 ms ( 0.01 usr 0.00 sys + 0.24 cusr 0.04 csys = 0.29 CPU)
[15:34:51] src/bin/pg_archivecleanup/t/010_pg_archivecleanup.pl ........ ok 177 ms ( 0.02 usr 0.00 sys + 0.26 cusr 0.03 csys = 0.31 CPU)
[15:34:55] src/bin/scripts/t/100_vacuumdb.pl ........................... ok 11267 ms ( 0.12 usr 0.04 sys + 13.47 cusr 3.20 csys = 16.83 CPU)
[15:34:57] src/bin/scripts/t/102_vacuumdb_stages.pl .................... ok 5802 ms ( 0.06 usr 0.01 sys + 7.70 cusr 1.37 csys = 9.14 CPU)
...

=> scripts/ stuff, followed by other stuff, followed by more, slower, scripts/ stuff.

But I never saw that in cirrus.

> Isn't this kind of a good test time? I thought earlier your alltaptests target
> took a good bit longer?

The original alltaptests runs in 16m 21s.
https://cirrus-ci.com/task/6679061752709120

2 weeks ago, it was ~14min with your patch to cache initdb.
https://cirrus-ci.com/task/5439320633901056

As I recall, that didn't seem to improve runtime when combined with my parallel
patch.

> One nice bit is that the output is a *lot* easier to read.
>
> You could try improving the total time by having prove remember slow tests and
> use that file to run the slowest tests first next time. --state slow,save or
> such I believe. Of course we'd have to save that state file...

In a test, this hurt rather than helped (13m 42s).
https://cirrus-ci.com/task/6359167186239488

I'm not surprised - it makes sense to run 10 fast tests at once, but usually
doesn't make sense to run 10 slow tests tests at once (at least a couple of
which are doing something intensive). It was faster (12m16s) to do it
backwards (fastest tests first).
https://cirrus-ci.com/task/5745115443494912

BTW, does it make sense to remove test_regress_parallel_script ? The
pg_upgrade run would do the same things, no ? If so, it might make sense to
run that first. OTOH, you suggested to run the upgrade tests with checksums
enabled, which seems like a good idea.

Note that in the attached patches, I changed the msvc "warnings" to use "tee".

I don't know how to fix the pipeline test in a less hacky way...

You said the docs build should be a separate task, but then said that it'd be
okay to remove the dependency. So I did it both ways. There's currently some
duplication between the docs patch and code coverage patch.

--
Justin

Attachment Content-Type Size
0001-cirrus-include-hints-how-to-install-OS-packages.patch text/x-diff 2.0 KB
0002-cirrus-windows-add-compiler_warnings_script.patch text/x-diff 1.1 KB
0003-cirrus-upload-changed-html-docs-as-artifacts.patch text/x-diff 2.6 KB
0004-s-build-docs-as-a-separate-task.patch text/x-diff 3.6 KB
0005-vcregress-ci-test-modules-contrib-with-NO_INSTALLCHE.patch text/x-diff 7.8 KB
0006-wip-cirrus-code-coverage.patch text/x-diff 2.6 KB
0007-vcsregress-add-alltaptests.patch text/x-diff 4.2 KB
0008-vcregress-run-alltaptests-in-parallel.patch text/x-diff 4.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-02-13 21:43:17 Re: [PATCH] Avoid open and lock the table Extendend Statistics (src/backend/commands/statscmds.c)
Previous Message Justin Pryzby 2022-02-13 21:31:20 Re: Adding CI to our tree