Re: A couple of random BF failures in kerberosCheck

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A couple of random BF failures in kerberosCheck
Date: 2019-08-03 22:42:48
Message-ID: 3397.1564872168@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> * kerberos/t/001_auth.pl just blithely assumes that it can pick
> any random port above 48K and that's guaranteed to be free.
> Maybe we should split out the code in get_new_node for finding
> a free TCP port, so we can call it here?

I've confirmed that the reason it's failing on my machine is exactly
that krb5kdc tries to bind to a socket that is still in TIME_WAIT state.
Also, it looks like the socket is typically one that was used by the
GSSAPI client side (no surprise, the test leaves a lot more of those
than the one server socket), so we'd have no record of it even if we
were somehow saving state from prior runs.

So I propose the attached patch, which seems to fix this for me.

The particular case I'm looking at (running these tests in a tight
loop) is of course not that interesting, but I argue that it's just
increasing the odds of failure enough that I can isolate the cause.
A buildfarm animal running both kerberos and ldap tests is almost
certainly at risk of such a failure with low probability.

(Still don't know what actually happened in those two buildfarm
failures, though.)

regards, tom lane

Attachment Content-Type Size
select-unused-port-1.patch text/x-diff 6.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-08-03 22:47:57 Re: Redacting information from logs
Previous Message Julien Rouhaud 2019-08-03 21:58:13 Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?