From: | Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Euler Taveira <euler(dot)taveira(at)enterprisedb(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se> |
Subject: | Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible |
Date: | 2025-03-07 18:28:16 |
Message-ID: | CAOYmi+nkEPh4724NKci7hFiZBbu4WEO99bL7=fu+EvTAb3N8uw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Mar 7, 2025 at 9:25 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> I should have clarified - there are a few that I think are ok, basically the
> places where we wrap syscalls, e.g. around the sendto, select and recvfrom in
> PerformRadiusTransaction().
Okay.
> OTOH that code is effectively completely broken. Doing a blocking select() is
> just a no-go, the code isn't interruptible, breaking authentication
> timeout. And using select() means that we theoretically could crash due to an
> fd that's above FD_SETSIZE.
I think we're in agreement here; I'm just trying to improve things
incrementally. If someone actually hits the broken case, I think it'd
be helpful for them to see it.
> I think some of the wrapped calls into library code might actually call back
> into our code (to receive/send data), and our code then will use wait events
> around lower level operations done as part of that.
That would be a problem, agreed, but I didn't think I'd wrapped any
callback APIs. (Admittedly I have little experience with the SSPI
stuff.) But looking at the wrapped calls in the patch... which are you
suspicious of?
> It's also IMO quite wrong to do something that can throw an error inside a
> wait event, because that means that the wait event will still be reported
> during error recovery.
Hm, okay. I can change that for the LookupAccountSid case.
> Probably not the only place doing so, but it's still
> wrong.
It's definitely not the only place. :D
> Why stop with
> these functions and not just do that for *all* functions in postgres? I mean
> it'd not work and slow everything down,
(That seems like a good reason not to do it for all functions in
Postgres, no? I hope the slope is not all that slippery in practice.)
> but how do you define that line?
Cost/benefit. In this case, authentication hanging in an unknown place
in PAM and LDAP has caused tangible support problems. I suspect I'd
have gotten complaints if I only focused on those two places, though,
so I expanded it to the other blocking calls I could see.
Thanks,
--Jacob
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2025-03-07 18:41:35 | Re: Statistics Import and Export |
Previous Message | Tom Lane | 2025-03-07 18:27:30 | Re: Add column name to error description |