Re: Serverside SNI support in libpq

From: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Pgsql Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Serverside SNI support in libpq
Date: 2024-07-25 17:51:05
Message-ID: CAOYmi+k_YBsO3jnxx9HBcChNzkzRW=Erm4yiPGsKV2_6rU+-4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 10, 2024 at 7:23 AM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
> The way multiple certificates are handled is that libpq creates one SSL_CTX for
> each at startup, and switch to the appropriate one when the connection is
> inspected.

I fell in a rabbit hole while testing this patch, so this review isn't
complete, but I don't want to delay it any more. I see a few
possibly-related problems with the handling of SSL_context.

The first is that reloading the server configuration doesn't reset the
contexts list, so the server starts behaving in really strange ways
the longer you test. That's an easy enough fix, but things got weirder
when I did. Part of that weirdness is that SSL_context gets set to the
last initialized context, so fallback doesn't always behave in a
deterministic fashion. But we do have to set it to something, to
create the SSL object itself...

I tried patching all that, but I continue to see nondeterministic
behavior, including the wrong certificate chain occasionally being
served, and the servername callback being called twice for each
connection (?!).

Since I can't reproduce the weirdest bits under a debugger yet, I
don't really know what's happening. Maybe my patches are buggy. Or
maybe we're running into some chicken-and-egg madness? The order of
operations looks like this:

1. Create a list of contexts, selecting one as an arbitrary default
2. Create an SSL object from our default context
3. During the servername_callback, reparent that SSL object (which has
an active connection underway) to the actual context we want to use
4. Complete the connection

It's step 3 that I'm squinting at. I wondered how, exactly, that
worked in practice, and based on this issue the answer might be "not
well":

https://github.com/openssl/openssl/issues/6109

Matt Caswell appears to be convinced that SSL_set_SSL_CTX() is
fundamentally broken. So it might just be FUD, but I'm wondering if we
should instead be using the SSL_ flavors of the API to reassign the
certificate chain on the SSL pointer directly, inside the callback,
instead of trying to set them indirectly via the SSL_CTX_ API.

Have you seen any weird behavior like this on your end? I'm starting
to doubt my test setup... On the plus side, I now have a handful of
debugging patches for a future commitfest.

Thanks,
--Jacob

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2024-07-25 17:56:18 Re: xid_wraparound tests intermittent failure.
Previous Message Alvaro Herrera 2024-07-25 17:07:21 Re: Possible null pointer dereference in afterTriggerAddEvent()