| From: | Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> | 
|---|---|
| To: | Marti Raudsepp <marti(at)juffo(dot)org>,Andrew Dunstan <andrew(at)dunslane(dot)net> | 
| Cc: | PGBuildFarm <pgbuildfarm-members(at)pgfoundry(dot)org> | 
| Subject: | Re: [Pgbuildfarm-members] Submission failures: 500 read timeout | 
| Date: | 2014-09-22 09:21:41 | 
| Message-ID: | 541FEA25.7080605@kaltenbrunner.cc | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | buildfarm-members | 
On 09/22/2014 11:15 AM, Marti Raudsepp wrote:
> On Mon, Sep 15, 2014 at 7:15 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> I have turned on request timing in the web logs. It looks like these status
>> uploads are typically taking 1 to 2 seconds to process. So I suspect it's
>> client-related.
>
> Well I managed to capture only 1 packet dump of this happening, on
> 2014-09-18 11:51:36 EEST. The problem seems to have disappeared, did
> configuration change on the server side? Or maybe it's just that fewer
> commits have been pushed recently. If anyone is interested, I can send
> the dump privately.
>
> I'm no expert on TCP, but it's conceivably a bug in the TCP stack. I'd
> like to collect a few more samples before bothering any networking
> people with it. Here's my understanding of what happened:
>
> 11:51:36.433 First packet of HTTP POST request is sent
> (data being sent)
> 11:51:39.254 Last packet of POST body
> 11:51:39.494 buildfarm responds with a SACK which, I believe,
> indicates a dropped packet
> (3 minutes pass silently)
> 11:54:38.010 My end sends a FIN, probably a timeout on client side,
> closing the socket
> 11:54:38.212 buildfarm responds with another SACK, repeating the missing packet
> (3 seconds, some retransmits occur for the missing data)
> 11:54:41.215 My end sends a RST (probably timeout because remote
> didn't have time to acknowledge the FIN yet)
> 11:54:41.236 Remote responds with "HTTP 200 OK", before it could have
> received my RST, but my local end no longer sees it because the
> connection is already reset.
>
> If my reading of RFC 2018 (SACK) is right, the sender must retransmit
> data after receiving a SACK packet if the missing data isn't
> acknowledged during the retransmit timeout. But this did not happen
> for 3 minutes. I don't know whether the receiver (buildfarm) should
> retransmit its SACK or not, but that only happened after it had
> received the FIN packet.
hard to say - but that description feels like a common problem going 10 
years backwards when stateful firewalls started doing sequence 
inspection and randomisation but were not yet SACK aware.
It might be a long stretch but maybe the path between your box and the 
buildfarm box is a bit lossy (as in small but regular) packetloss _AND_ 
there is a device on either side that has a slightly broken stateful 
inspection firewall (old cisco PIX/ASA, some sonicwals, cisco FWSM, very 
very old linux kernel ipchain/iptables issues) or very aggressive 
timings on TCP sessions (ie misguided DoS prevention)
Stefan
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Marti Raudsepp | 2014-09-22 09:59:25 | Re: [Pgbuildfarm-members] Submission failures: 500 read timeout | 
| Previous Message | Marti Raudsepp | 2014-09-22 09:15:16 | Re: [Pgbuildfarm-members] Submission failures: 500 read timeout |