Re: [PoC] Federated Authn/z with OAUTHBEARER

From: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Antonin Houska <ah(at)cybertec(dot)at>, Kashif Zeeshan <kashi(dot)zeeshan(at)gmail(dot)com>
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER
Date: 2025-01-14 01:00:00
Message-ID: CAOYmi+=CsAATregXpdRsZDUauu7W=hwzMuexhx--7Lz9qNg_xg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 13, 2025 at 3:21 PM Jacob Champion
<jacob(dot)champion(at)enterprisedb(dot)com> wrote:
> Next email will discuss the architectural bug that Kashif found.

Okay, here goes. A standard OAuth connection attempt looks like this
(oh, I hope Gmail doesn't mangle it):

Issuer User libpq Backend
| | |
| x -----> x -----> o [1] Startup Packet
| | | |
| | x <----- x [2] OAUTHBEARER Request
| | | |
| | x -----> x [3] Parameter Discovery
| | | |
| | x <----- o [4] Parameters Stored
| | |
| | |
| | |
| | x -----> o [5] New Startup Packet
| | | |
| | x <----- x [6] OAUTHBEARER Request
| | | |
x <----- x <----> x |
x <----- x <----> x | [7] OAuth Handshake
x <----- x <----> x |
| | | |
o | x -----> x [8] Send Token
| | |
| <----- x <----- x [9] Connection Established
| | |
x <----> x <----> x
x <----> x <----> x [10] Use the DB
. . .
. . .
. . .

When the server first asks for a token via OAUTHBEARER (step 2), the
client doesn't necessarily know what the server's requirements are for
a given user. It uses the rest of the doomed OAUTHBEARER exchange to
store the issuer and scope information in the PGconn (step 3-4), then
disconnects and sets need_new_connection in PQconnectPoll() so that a
second connection is immediately opened (step 5). When the OAUTHBEARER
mechanism takes control the second time, it has everything it needs to
conduct the login flow with the issuer (step 7). It then sends the
obtained token to establish a connection (steps 8 onward).

The problem is that step 7 is consuming the authentication_timeout for
the backend. I'm very good at completing these flows quickly, but if
you can't complete the browser prompts in time, you will simply not be
able to log into the server. Which is harsh to say the least. (Imagine
the pain if the standard psql password prompt timed out.) DBAs can get
around it by increasing the timeout, obviously, but that doesn't feel
very good as a solution.

Last week I looked into a fix where libpq would simply try again with
the stored token if the backend hangs up on it during the handshake,
but I think that will end up making the UX worse. The token validation
on the server side isn't going to be instantaneous, so if the client
is able to complete the token exchange in 59 seconds and send it to
the backend, there's an excellent chance that the connection is still
going to be torn down in a way that's indistinguishable from a crash.
We don't want the two sides to fight for time.

So I think what I'm going to need to do is modify v41-0003 to allow
the mechanism to politely hang up the connection while the flow is in
progress. This further decouples the lifetimes of the mechanism and
the async auth -- the async state now has to live outside of the SASL
exchange -- but I think it's probably more architecturally sound. Yell
at me if that sounds unmaintainable or if there's a more obvious fix
I'm missing.

Huge thanks to Kashif for pointing this out!

--Jacob

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2025-01-14 01:07:15 Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING
Previous Message Alena Rybakina 2025-01-13 23:51:47 Re: pgsql: Consolidate docs for vacuum-related GUCs in new subsection