Quick Links

Re: Finding cause of test fails on the cfbot site

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject:	Re: Finding cause of test fails on the cfbot site
Date:	2021-02-22 14:46:50
Message-ID:	f430b639-7023-5afa-c1e1-e2b4a249cf1c@dunslane.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2/21/21 10:34 PM, Andres Freund wrote:
> Hi,
>
> On 2021-02-17 15:18:02 -0500, Andrew Dunstan wrote:
>> yeah. The cfbot runs check-world which makes it difficult for it to know
>> which log files to show when there's an error. That's a major part of
>> the reason the buildfarm runs a much finer grained set of steps.
> I really think we need a better solution for this across the different
> use-cases of running tests. For development parallel check-world is
> important for a decent hack-test loop. But I waste a fair bit of time to
> scroll back to find the original source of failures. And on the
> buildfarm we waste a significant amount of time by limiting parallelism
> due to the non-parallel sequence of finer grained steps.

Maybe but running fast isn't really a design goal of the buildfarm. It's
meant to be automated so it doesn't matter if it takes 10 or 20 or 60
minutes.

Another reason for using fine grained tasks is to be able to
include/exclude them as needed. See the buildfarm's
exclude-steps/only-steps parameters.

That said there is some provision for parallelism, in that multiple
branches and multiple members can be tested at the same time, and
run_branches.pl will manage that fairly nicely for you. See
<https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto#Running_in_Parallel>
for details

> And it's not just about logs - even just easily seeing the first
> reported test failure without needing to search through large amounts of
> text would be great.
>
> With, um, more modern buildtools (e.g. ninja) you'll at least get the
> last failure displayed at the end, instead of seing a lot of other
> things after it like with make.
>
>
> My suspicion is that, given the need to have this work for both msvc and
> make, writing an in-core test-runner script is the only real option to
> improve upon the current situation.

Ok ... be prepared for a non-trivial maintenance cost, however, which
will be born by those of us fluent in perl, the only realistic
possibility unless we want to add to build dependencies. That's far from
everyone.

Part of the problem that this isn't going to solve is the sheer volume
that some tests produce. For example, the pg_dump tests produce about
40k lines / 5Mb of log.

> For make it'd not be hard to add a recursive 'listchecks' target listing
> the individual tests that need to be run. Hacking up vcregress.pl to do
> that, instead of what it currently does, shouldn't be too hard either.
>
>
> Once there's a list of commands that need to be run it's not hard to
> write a loop in perl that runs up to N tests in parallel, saving their
> output. Which then allows to display the failing test reports at the
> end.
>
>
> If we then also add a convention that each test outputs something like
> TESTLOG: path/to/logfile
> ...
> it'd not be hard to add support for the test runner to list the files
> that cfbot et al should output.

Yeah, there is code in the buildfarm that contains a lot of building
blocks that can be used for this sort of stuff, see the PGBuild::Utils
and PGBuild::Log modules.

> Looking around the tree, the most annoying bit to implement something
> like this is that things below src/bin/, src/interfaces, src/test,
> src/pl implement their own check, installcheck targets. Given the number
> of these that just boil down to a variant of
>
> check:
> $(pg_regress_check)
> $(prove_check)
> installcheck:
> $(pg_regress_installcheck)
>
> it seems we should lift the REGRESS and TAP_TESTS specific logic in
> pgxs.mk up into src/Makefiles.global. Which then would make something
> list a global listchecks target easy.
>

Yeah, some of this stuff has grown a bit haphazardly, and maybe needs
some rework.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

In response to

Re: Finding cause of test fails on the cfbot site at 2021-02-22 03:34:47 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jan Wieck	2021-02-22 14:55:53	Re: Extensibility of the PostgreSQL wire protocol
Previous Message	Andrew Dunstan	2021-02-22 14:17:28	Re: Finding cause of test fails on the cfbot site