Buildfarm coverage planning (was: what's going on with lapwing?)

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, pgbuildfarm(at)rjuju(dot)net
Subject: Buildfarm coverage planning (was: what's going on with lapwing?)
Date: 2025-03-08 13:11:18
Message-ID: 07802ece-8718-40f0-8713-8fc1f3558e4e@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2025-03-07 Fr 8:52 AM, Robert Haas wrote:
> On Thu, Mar 6, 2025 at 7:03 PM Julien Rouhaud<rjuju123(at)gmail(dot)com> wrote:
>> Honestly, it's been years of people complaining on one thing or another about
>> lapwing without ever asking for a change. Was it really hard to ask "can you
>> remove the -Werror it's not useful anymore" the first time it caused extra
>> work? Instead I have to guess what people want. So after a few complaints I
>> removed that flag. And now after a few more complaints I turned it off. If
>> that's not what you want, well too bad but that's on you, not me.
> This is actually much harder for me as a committer than you might
> guess. How is an individual committer working on an individual issue
> supposed to know that removing -Wall is the right thing vs. fixing the
> warning in some way? As Melanie also mentions in her reply, committers
> are not born knowing or understanding which machines are chronic
> problems, and sometimes it's very difficult to figure that out. If you
> read every message on the mailing list you're probably going to have
> more idea than if you don't, but that's more than most people can do
> these days, and you can still miss things.
>
> But I think your complaint here is actually getting at another problem
> with the way that we do the buildfarm as a project: it's completely
> unplanned. People show up and run random buildfarm machines and nobody
> knows why they are running those machines: was it because they care
> about support for that platform, or was it just to be nice, or where
> they testing some unusual configuration, or just because they set it
> up a long time ago and have never turned it off? And then other
> buildfarm members that maybe we ought to have are missing and nobody
> knows if anyone else is working on that, or maybe they don't even know
> about it. And then, to your point, nobody ever shows up and tells a
> buildfarm member owner what we'd like them to do. There is no "what
> we'd like them to do" -- we have no policy or preference or anything
> as a group. Everybody's just guessing what other people want and care
> about, and then sometimes we're all grumpy at each other.
>
> But notice that with CI, it's the other way around. Some small group
> of people decide what our CI setup should do and then they configure
> it to do that thing. You can agree with those decisions or not, but
> they are intentional. The platforms and OS versions that are being
> tested are what somebody decided was best from among the available
> options. Now I'm not saying that sort of centralized planning is
> without flaws -- and the fact that only a handful of people seem to
> understand how to keep this CI stuff working is definitely one of them
> -- but it also has some strengths, namely that you remove a lot of
> this guesswork around what the other people involved actually want.
>
> I'm not sure exactly where I'm going with this line of thought, but I
> do wonder if we ought to find a way to be more intentional about the
> buildfarm instead of just letting things happen and then being sad
> that we didn't get what we wanted.
>

I've renamed the thread because it's got a long way past what's happened
with lapwing.

I agree that the coverage is unplanned, and that we should do better
about it.

However, we can't cover every possible combination of Architecture, OS,
Compiler, Build System, Build Options. There are roughly 170 currently
reporting buildfarm animals. Comprehensive coverage would mean a fleet
of thousands, even if we ran multiple animals on a single machine. And
I'm not sure that would be a win - the status page would become
unmanageable and useless. So we need to be strategic about it.

So let's start with a couple of simple questions:

1. what are the perceived major gaps in coverage?

2. what animals (if any) should be turned off or updated?

I'll start with my nomination for 1: we need a Windows animal that
builds in the same way and with the same options as the installer
binaries. That's something we are actually working on at EDB.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2025-03-08 13:41:24 Re: pg_atomic_compare_exchange_*() and memory barriers
Previous Message Andres Freund 2025-03-08 13:02:41 Re: pg_atomic_compare_exchange_*() and memory barriers