Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

From: David Benjamin <davidben(at)google(dot)com>
To: daniel(at)yesql(dot)se
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions
Date: 2024-02-12 16:31:21
Message-ID: CAF8qwaCskc_1vuAkOm9YWVCb2_Af2NvFMbERWso--CDAW5t=OQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 12, 2024 at 9:38 AM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:

> > On 11 Feb 2024, at 19:19, David Benjamin <davidben(at)google(dot)com> wrote:
> > It turns out the parts that came from the OpenSSL socket BIO were a
> no-op, and in fact PostgreSQL is relying on it being a no-op. Instead, it's
> cleaner to just define a custom BIO the normal way, which then leaves the
> more standard BIO_get_data mechanism usable. This also avoids the risk that
> a future OpenSSL will add a now BIO_ctrl to the socket type, with libssl
> calling into it, and then break some assumptions made by PostgreSQL.
>
> + case BIO_CTRL_FLUSH:
> + /* libssl expects all BIOs to support BIO_flush. */
> + res = 1;
> + break;
>
> Will this always be true? Returning 1 implies that we have flushed all
> data on
> the socket, but what if we just before called BIO_set_retry..XX()?
>

The real one is also just a no-op. :-)
https://github.com/openssl/openssl/blob/master/crypto/bio/bss_sock.c#L215

This is used in places like buffer BIO or the FILE* BIO, where BIO_write
might accept data, but stage it into a buffer, to be flushed later. libssl
ends up calling BIO_flush at the end of every flight, which in turn means
that BIOs used with libssl need to support it, even if to return true
because there's nothing to flush. (Arguably TCP sockets could have used a
flush concept, to help control Nagle's algorithm, but for better or worse,
that's a socket-wide TCP_NODELAY option, rather than an explicit flush
call.)

BIO_set_retry.. behaves like POSIX I/O, where a failed EWOULDBLOCK write is
as if you never wrote to the socket at all and doesn't impact socket state.
That is, the data hasn't been accepted yet. It's not expected for BIO_flush
to care about the rejected write data. (Also I don't believe libssl will
ever trigger this case.) It's confusing because unlike an EWOULDBLOCK
errno, BIO_set_retry.. *is* itself BIO state, but that's just because the
BIO calling convention is goofy and didn't just return the error out of the
return value. So OpenSSL just stashes the bit on the BIO itself, for you to
query out immediately afterwards.

> > I've attached a patch which does that. The existing SSL tests pass with
> it, tested on Debian stable. (Though it took me a few iterations to figure
> out how to run the SSL tests, so it's possible I've missed something.)
>
> We've done a fair bit of work on making them easier to run, so I'm curious
> if
> you saw any room for improvements there as someone coming to them for the
> first
> time?
>

A lot of my time was just trying to figure out how to run the tests in the
first place, so perhaps documentation? But I may just have been looking in
the wrong spot and honestly didn't really know what I was doing. I can try
to summarize what I did (from memory), and perhaps that can point to
possible improvements?

- I looked in the repository for instructions on running the tests and
couldn't find any. At this point, I hadn't found src/test/README.
- I found
https://wiki.postgresql.org/wiki/Developer_FAQ#How_do_I_test_my_changes.3F,
linked from https://www.postgresql.org/developer/
- I ran the configure build with --enable-cassert, ran make check, tests
passed.
- I wrote my patch and then spent a while intentionally adding bugs to see
if the tests would catch it (I wasn't sure whether there was ssl test
coverage), finally concluding that I wasn't running any ssl tests
- I looked some more and found src/test/ssl/README
- I reconfigured with --enable-tap-tests and ran make check
PG_TEST_EXTRA=ssl per those instructions, but the SSL tests still weren't
running
- I grepped for PG_TEST_EXTRA and found references in the CI config, but
using the meson build
- I installed meson, mimicked a few commands from the CI. That seemed to
work.
- I tried running only the ssl tests, looking up how you specify individual
tests in meson, to make my compile/test cycles a bit faster, but they
failed.
- I noticed that the first couple "tests" were named like setup tasks, and
guessed that the ssl tests depended on this setup to run. But by then I
just gave up and waited out the whole test suite per run. :-)

Once I got it running, it was quite smooth. I just wasn't sure how to do it.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2024-02-12 16:37:46 Re: Patch: Add parse_type Function
Previous Message Nathan Bossart 2024-02-12 16:20:00 Re: backend *.c #include cleanup (IWYU)