Re: Buildfarm feature request: some way to track/classify failures

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Buildfarm feature request: some way to track/classify failures
Date: 2007-03-19 22:58:28
Message-ID: 18456.1174345108@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Tom Lane wrote:
>> Actually what I *really* want is something closer to "show me all the
>> unexplained failures", but unless Andrew is willing to support some way
>> of tagging failures in the master database, I suppose that won't happen.

> Who would do the tagging, and how?

Well, that's the hard part isn't it? I was sort of envisioning a group
of users who'd be authorized to log in and set tags on database entries
somehow. I'm not sure about details. One issue is that the majority
of failures come in batches (when one of us commits a bad patch).
With the current web interface it would be real tedious to verify which
of the failures in a particular time interval matched the symptoms of
a failure. What I did for my experiment this weekend was to download
the last-stage-log of each failed build, which required an hour or so
of setup time; then I could use grep to confirm which logs matched a
failure that I'd identified. Doing that through the current webpage
would involve lots of clicking and waiting. If we could expose a
text-search-style API for grepping the stage logs, it'd be a lot easier
to collect related failures. Then maybe a few widgets to let authorized
users apply a tag to the search results ...

I'm not entirely sure that this infrastructure would pay for itself,
though. Without some users willing to take the time to separate
explained from unexplained failures, it'd be a waste of effort.
But we've already had a couple of cases of interesting failures going
unnoticed because of the noise level. Between duplicate reports about
busted patches and transient problems on particular build machines
(out of disk space, misconfiguration, etc) it's pretty hard to not miss
the once-in-a-while failures. Is there some other way we could attack
that problem?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian G. Pflug 2007-03-19 23:17:29 Re: modifying the tbale function
Previous Message Luis D. García 2007-03-19 22:16:57 Make TIMESTAMP + TIME in the source code