From: | Daniel Gustafsson <daniel(at)yesql(dot)se> |
---|---|
To: | Jacob Champion <pchampion(at)vmware(dot)com> |
Cc: | "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>, "andrew(dot)dunstan(at)2ndquadrant(dot)com" <andrew(dot)dunstan(at)2ndquadrant(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "thomas(dot)munro(at)gmail(dot)com" <thomas(dot)munro(at)gmail(dot)com>, "sfrost(at)snowman(dot)net" <sfrost(at)snowman(dot)net>, "michael(at)paquier(dot)xyz" <michael(at)paquier(dot)xyz> |
Subject: | Re: Support for NSS as a libpq TLS backend |
Date: | 2021-06-16 13:31:42 |
Message-ID: | 70A12E33-90B8-4761-8FA6-8DDB9EEA4D3E@yesql.se |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On 16 Jun 2021, at 01:50, Jacob Champion <pchampion(at)vmware(dot)com> wrote:
> I've been tracking down reference leaks in the client. These open
> references prevent NSS from shutting down cleanly, which then makes it
> impossible to open a new context in the future. This probably affects
> other libpq clients more than it affects psql.
Ah, nice catch, that's indeed a bug in the frontend implementation. The
problem is that the NSS trustdomain cache *must* be empty before shutting down
the context, else this very issue happens. Note this in be_tls_destroy():
/*
* It reads a bit odd to clear a session cache when we are destroying the
* context altogether, but if the session cache isn't cleared before
* shutting down the context it will fail with SEC_ERROR_BUSY.
*/
SSL_ClearSessionCache();
Calling SSL_ClearSessionCache() in pgtls_close() fixes the error.
There is another resource leak left (visible in one test after the above is
added), the SECMOD module needs to be unloaded in case it's been loaded.
Implementing that with SECMOD_UnloadUserModule trips a segfault in NSS which I
have yet to figure out (when acquiring a lock with NSSRWLock_LockRead).
> The first step to fixing that is not ignoring failures during NSS
> shutdown, so I've tried a patch to pgtls_close() that pushes any
> failures through the pqInternalNotice(). That seems to be working well.
I'm keeping these in during hacking, with a comment that they need to be
revisited during review since they are mainly useful for debugging.
> The tests were still mostly green, so I taught connect_ok() to fail if
> any stderr showed up, and that exposed quite a few failures.
With your patches I'm seeing a couple of these:
SSL error: The one-time function was previously called and failed. Its error code is no longer available
This is an error from NSPR, but it's not clear to me which PR_CallOnce call
it's coming from. It seems to be hitting in the SAN and CRL tests, so it
smells of some form of caching implemented with NSPR API's to me but thats a
mere hunch.
> I am currently stuck on one last failing test. This leak seems to only
> show up when using TLSv1.2 or below.
AFAICT the session cache is avoided for TLSv1.3 due to 1.3 not supporting
renegotiation.
--
Daniel Gustafsson https://vmware.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2021-06-16 13:33:38 | Re: Unresolved repliaction hang and stop problem. |
Previous Message | Heikki Linnakangas | 2021-06-16 13:30:45 | Split xlog.c |