From: | Jacob Champion <pchampion(at)vmware(dot)com> |
---|---|
To: | "daniel(at)yesql(dot)se" <daniel(at)yesql(dot)se> |
Cc: | "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>, "andrew(dot)dunstan(at)2ndquadrant(dot)com" <andrew(dot)dunstan(at)2ndquadrant(dot)com>, "michael(at)paquier(dot)xyz" <michael(at)paquier(dot)xyz>, "thomas(dot)munro(at)gmail(dot)com" <thomas(dot)munro(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "sfrost(at)snowman(dot)net" <sfrost(at)snowman(dot)net> |
Subject: | Re: Support for NSS as a libpq TLS backend |
Date: | 2021-06-16 16:15:56 |
Message-ID: | 14b33300304507db6f4e4d8a77da105a8d18ea1a.camel@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 2021-06-16 at 15:31 +0200, Daniel Gustafsson wrote:
> > On 16 Jun 2021, at 01:50, Jacob Champion <pchampion(at)vmware(dot)com> wrote:
> > I've been tracking down reference leaks in the client. These open
> > references prevent NSS from shutting down cleanly, which then makes it
> > impossible to open a new context in the future. This probably affects
> > other libpq clients more than it affects psql.
>
> Ah, nice catch, that's indeed a bug in the frontend implementation. The
> problem is that the NSS trustdomain cache *must* be empty before shutting down
> the context, else this very issue happens. Note this in be_tls_destroy():
>
> /*
> * It reads a bit odd to clear a session cache when we are destroying the
> * context altogether, but if the session cache isn't cleared before
> * shutting down the context it will fail with SEC_ERROR_BUSY.
> */
> SSL_ClearSessionCache();
>
> Calling SSL_ClearSessionCache() in pgtls_close() fixes the error.
That's unfortunate. The session cache is global, right? So I'm guessing
we'll need to refcount and lock that call, to avoid cleaning up out
from under a thread that's actively using the the cache?
> There is another resource leak left (visible in one test after the above is
> added), the SECMOD module needs to be unloaded in case it's been loaded.
> Implementing that with SECMOD_UnloadUserModule trips a segfault in NSS which I
> have yet to figure out (when acquiring a lock with NSSRWLock_LockRead).
>
> [...]
>
> With your patches I'm seeing a couple of these:
>
> SSL error: The one-time function was previously called and failed. Its error code is no longer available
Hmm. Adding SSL_ClearSessionCache() (without thread-safety at the
moment) fixes all of the SSL tests for me, and I don't see either the
SECMOD leak or the "one-time function" error that you've mentioned.
What version of NSS are you running? I'm on 3.63.
I've attached my current patchset (based on v37) for comparison.
> > I am currently stuck on one last failing test. This leak seems to only
> > show up when using TLSv1.2 or below.
>
> AFAICT the session cache is avoided for TLSv1.3 due to 1.3 not supporting
> renegotiation.
Nice, at least that mystery is solved. :D
Thanks,
--Jacob
Attachment | Content-Type | Size |
---|---|---|
0001-nss-don-t-ignore-failures-during-context-shutdown.patch.txt | text/plain | 1.3 KB |
0002-test-check-for-empty-stderr-during-connect_ok.patch.txt | text/plain | 3.6 KB |
0003-nss-clean-up-leaked-resources.patch.txt | text/plain | 3.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2021-06-16 16:23:28 | PoC: Using Count-Min Sketch for join cardinality estimation |
Previous Message | Stephen Frost | 2021-06-16 16:11:45 | Re: snapshot too old issues, first around wraparound and then more. |