Re: Proposal for implementing OCSP Stapling in PostgreSQL

From: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
To: David Zhang <david(dot)zhang(at)highgo(dot)ca>
Cc: Pgsql Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Subject: Re: Proposal for implementing OCSP Stapling in PostgreSQL
Date: 2024-07-15 23:59:24
Message-ID: CAOYmi+mqKLHUPO=Vjxj6+s8pQWLwLCv_KCzZ4B6du5hvXzRRsw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 5, 2024 at 4:12 PM David Zhang <david(dot)zhang(at)highgo(dot)ca> wrote:
> This is the third version patch for "Certificate status check using OCSP
> Stapling" with ssl regression test cases added.

Hi David,

Thanks again for working on this! So far I've taken a look at the
design and tests. I've only skimmed the callback implementations; I
plan to review those in more detail once the architecture is nailed
down.

= Design =

It looks like this design relies on the DBA to manually prefetch OCSP
responses for their cert chain, and cache them in the local
ssl_ocsp_file. This is similar to Nginx's ssl_stapling_file directive
[1]. I think this may make sense for a v1 (much less code!), but it's
going to take a lot of good documentation on what, exactly, an admin
has to do to make sure their response file is fresh. IIUC it will
depend on the CA's policy and how they operate their responder.

We should probably be prepared for complaints that we don't run the
client ourselves. A v2 could maybe include an extension for running
the OCSP client in a background worker? Or maybe that's overkill, and
an existing job scheduler extension could do it, but having a custom
worker that could periodically check the response file and provide
warnings when it gets close to expiring might be helpful for admins.

I am worried about the multi-stapling that is being done in the tests,
for example the server-ca-ocsp-good response file. I think those tests
are cheating a bit. Yes, for this particular case, we can issue a
single signed response for both the intermediate and the leaf -- but
that's only because we issued both of them. What if your intermediate
and your root use different responders [2]? What if your issuer's
responder doesn't even support multiple certs per request [3]?

(Note that OpenSSL still doesn't support multi-stapling [4], even in
TLS 1.3 where we were supposed to get it "for free".)

I think it would be good for the sslocspstapling directive to 1) maybe
have a shorter name (cue bikeshed!) and 2) support more than a binary
on/off. For example, the current implementation could use
"disable/require" options, and then a v2 could add "prefer" which
simply sends the status request and honors must-staple extensions on
the certificate. (That should be safe to default to, I think, and it
would let DBAs control the stapling rollout more easily.)

A question from ignorance: how does the client decide that the
signature on the OCSP response is actually valid for the specific
chain in use?

= Tests =

I think the tests should record the expected_stderr messages for
failures (see the Code section below for why).

If it turns out that multi-stapling is safe, then IMO the tests should
explicitly test both TLSv1.2 and v1.3, since those protocols differ in
how they handle per-certificate status.

If it's okay with you, I'd like to volunteer to refactor out the
duplicated recipes in sslfiles.mk. I have some particular ideas in
mind and don't want to make you play fetch-a-rock. (No pressure to use
what I come up with, if you don't like it.)

Because a CRL is a valid fallback for OCSP in practice, I think there
should be some tests for their interaction. (For v1, maybe that's as
simple as "you're not allowed to use both yet", but it should be
explicit.)

= Code =

(this is a very shallow review)

+#define OCSP_CERT_STATUS_OK 1
+#define OCSP_CERT_STATUS_NOK (-1)

Returning f -1 from the callback indicates an internal error, so we're
currently sending the wrong alerts for OCSP failures ("internal error"
rather than "bad certificate status response") and getting unhelpful
error messages from libpq. For cases where the handshake proceeds
correctly but we don't accept the OCSP response status, I think we
should be returning zero.

Also, we should not stomp on the OCSP_ namespace (I thought these
macros were part of the official <openssl/ocsp.h> API at first).

+ /* set up OCSP stapling callback */
+ SSL_CTX_set_tlsext_status_cb(SSL_context, ocsp_stapling_cb);

It seems like this should only be done if ssl_ocsp_file is set?

Thanks again!
--Jacob

[1] https://nginx.org/en/docs/http/ngx_http_ssl_module.html#ssl_stapling_file
[2] as appears to be the case for postgresql.org
[3] https://community.letsencrypt.org/t/bulk-ocsp-requests/168156/2
[4] https://github.com/openssl/openssl/pull/20945

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-07-16 00:21:46 Re: Parent/child context relation in pg_get_backend_memory_contexts()
Previous Message Joseph Koshakow 2024-07-15 23:55:22 Re: Remove dependence on integer wrapping