Re: 503 Backend fetch failed errors

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: remi_zara(at)mac(dot)com, buildfarm-members(at)lists(dot)postgresql(dot)org, sysadmins <sysadmins(at)lists(dot)postgresql(dot)org>
Subject: Re: 503 Backend fetch failed errors
Date: 2018-11-08 09:04:33
Message-ID: CABUevEw7FJNK7mcZL66r-wUV27wCfMQvuKBCqt+YRRJLqdm30Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: buildfarm-members

On Thu, Nov 8, 2018 at 9:42 AM Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
wrote:

> On 11/8/18 9:35 AM, Magnus Hagander wrote:
> >
> >
> > On Thu, Nov 8, 2018 at 9:31 AM Stefan Kaltenbrunner
> > <stefan(at)kaltenbrunner(dot)cc> wrote:
> >
> > On 11/8/18 9:19 AM, Magnus Hagander wrote:
> > >
> > >
> > > On Thu, Nov 8, 2018 at 9:01 AM Stefan Kaltenbrunner
> > > <stefan(at)kaltenbrunner(dot)cc> wrote:
> > >
> > > On 11/7/18 6:40 PM, Rémi Zara wrote:
> > > > Hi,
> > >
> > > Hi Rémi!
> > >
> > > >
> > > > I’m getting a lot of these errors with coypu (several per
> > day),
> > > but not systematically.
> > > > Is this a problem on my end, or is this on the sever end ?
> > > >
> > > > Query for: stage=OK&animal=coypu&ts=1541573277
> > > > Target:
> > >
> >
> https://buildfarm.postgresql.org/cgi-bin/pgstatus.pl/53b137e7c765b781699bbe73e3aec7751a8c4ab7
> > > > Status Line: 503 Backend fetch failed
> > > > Web txn failed with status: 1
> > > > Query for: stage=OK&animal=coypu&ts=1541575423
> > > > Target:
> > >
> > https://buildfarm.postgresql.org/cgi-bin/pgstatus.pl/
> <https://buildfarm.postgresql.org/cgi-bin/pgstatus.pl/5e8e0913dc9f9a580a4125264d74fff95f26c926>
> 1
> > > > Status Line: 503 Backend fetch failed
> > > > Web txn failed with status: 1
> > >
> > > given the error this is something that is created by the
> varnish
> > > instance that is in front of the buildfarm. On a quick look I
> > could
> > > immediately figure out what the problem is - but it looks
> > like you (or
> > > somebody else) tried at least to click one of the links above
> > using
> > > hist
> > > desktop browser and got an error about a missing branch
> > specification ;)
> > >
> > >
> > >
> > > AFAICT:
> > >
> > > A quick look in the logs indicates that the buildfarm is
> responding:
> > > - RespHeader Status: 492 bad branch parameter
> > >
> > > However, 492 is not a valid http status code, so Varnish can't
> > handle it
> > > and thus returns 503 failure to the client.
> >
> > I think that is not the actual error that Rémi is experiencing- the
> 492
> > case (which is indeed an invalid http error code) only happens when
> one
> > actually klicks the link in the mail above(which I guess some did and
> > you found in the logs) because the actual BF client will add a
> > parameter
> > to the "Target" URL.
> >
> > The actual "errors" dont seem to show up in the lighttpd logs afaiks.
> >
> >
> > Oh, sorry. I was checking the one called "target", I assumed that was
> > the URL that failed.
> >
> > Assuming for the original ones the ts is part of the URL, none of that
> > is still in the logs. Or are they post parameters? Do we know exactly
> > which URL is actually failing, and when (exactly) this happened?
>
> well - most of the parameters to each url are in the error report (f.e.
> "Query for: stage=OK&animal=coypu&ts=1541575423") I dunno whether Rémi
> knows which branch that was for? - that one also has a unix timestamp,
> though I "think" that is the timestamp from when the build started on
> the bf-client and not the ts when the request was made)
>
> Afaiks the two requests are not at all in the lighly log so only varnish
> might have seen them (though its unclear what error it got while
> connecting to lighty)
>

They'll be hard to find in the Varnish log without actually having the URL.
There is nothin gin the varnish log with 1541575423 in it at all. And there
is nothing with "coypu" and a http 503 in it either. And the log goes back
to Nov 6...

So my guess is it might be a POST which doesn't actually have the animal
name or the timestamp on the URL.

I do see some general POSTs returning 503. They all seem to be of the type
going to pgstatus.pl like the ones above, so maybe that is the URL after
all? If I look at just POSTs there, I see a single one, and it has:

- FetchError Resource temporarily unavailable
- FetchError straight insufficient bytes

"Straight insufficient bytes" means there is a mismatch between
Content-Length and the actual amount of data sent/read.

And on the backend side:

-- FetchError req.body read error: 11 (Resource temporarily
unavailable)

I believe this means that varnish is actually failing to read the request
body from the *client*, in order to pass it on to the server. In that case,
it could be that the client sends the wrong length. It does send a
content-length header of 4160573 bytes -- perhaps it stops sending data
before it gets there. Is that a "reasonable size" package being sent? It's
quite a big POST.

The error occured 7.25 seconds after Varnish started talking to lighttpd.
So it at least did something first. Perhaps if it actually is bigger than
4MB it hit some sort of limit and lighttpd killed the request?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Browse buildfarm-members by date

  From Date Subject
Next Message Tom Lane 2018-11-11 18:21:38 Time to close down buildfarm support for REL9_3_STABLE branch
Previous Message Stefan Kaltenbrunner 2018-11-08 08:42:22 Re: 503 Backend fetch failed errors