Re: Improving tracking/processing of buildfarm test failures

From: Noah Misch <noah(at)leadboat(dot)com>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improving tracking/processing of buildfarm test failures
Date: 2024-05-24 20:00:35
Message-ID: 20240524200035.c2@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 23, 2024 at 02:00:00PM +0300, Alexander Lakhin wrote:
> I'd like to discuss ways to improve the buildfarm experience for anyone who
> are interested in using information which buildfarm gives to us.
>
> Unless I'm missing something, as of now there are no means to determine
> whether some concrete failure is known/investigated or fixed, how
> frequently it occurs and so on... From my experience, it's not that
> unbelievable that some failure occurred two years ago and lost in time was
> an indication of e. g. a race condition still existing in the code/tests
> and thus worth fixing. But without classifying/marking failures it's hard
> to find such or other interesting failure among many others...

I agree this is an area of difficulty consuming buildfarm results. I have an
inefficient template for studying a failure, which your proposals would help:

**** grep recent -hackers for animal name
**** search the log for ~10 strings (e.g. "was terminated") to find the real indicator of where it failed
**** search mailing lists for that indicator
**** search buildfarm database for that indicator

> The first way to improve things I can imagine is to add two fields to the
> buildfarm database: a link to the failure discussion (set when the failure
> is investigated/reproduced and reported in -bugs or -hackers) and a commit
> id/link (set when the failure is fixed). I understand that it requires

I bet the hard part is getting data submissions, so I'd err on the side of
making this as easy as possible for submitters. For example, accept free-form
text for quick notes, not only URLs and commit IDs.

> modifying the buildfarm code, and adding some UI to update these fields,
> but it allows to add filters to see only unknown/non-investigated failures
> in the buildfarm web interface later.
>
> The second way is to create a wiki page, similar to "PostgreSQL 17 Open
> Items", say, "Known buildfarm test failures" and fill it like below:
> <url to failure1>
> <url to failure2>
> ...
> Useful info from the failure logs for reference
> ...
> <link to -hackers thread>
> ---
> This way is less invasive, but it would work well only if most of
> interested people know of it/use it.
> (I could start with the second approach, if you don't mind, and we'll see
> how it works.)

Certainly you doing (2) can only help, though it may help less than (1).

I recommend considering what the buildfarm server could discover and publish
on its own. Examples:

- N members failed at the same step, in a related commit range. Those members
are now mostly green. Defect probably got fixed quickly.

- Log contains the following lines that are highly correlated with failure.
The following other reports, if any, also contained them.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2024-05-24 20:23:24 Re: commitfest.postgresql.org is no longer fit for purpose
Previous Message Tom Lane 2024-05-24 20:00:21 Re: DROP OWNED BY fails to clean out pg_init_privs grants