From: | Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Euler Taveira <euler(dot)taveira(at)enterprisedb(dot)com> |
Subject: | Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible |
Date: | 2025-02-10 16:23:30 |
Message-ID: | CAOYmi+=azjJ064LfZG0pJ6DBjx9Ro0Kzdd+Z2Jyyy=66cqL+8w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Feb 6, 2025 at 12:35 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> /* Update app name to current GUC setting */
> + /* TODO: ask the list: maybe do this before setting STATE_UNDEFINED? */
> if (application_name)
> pgstat_report_appname(application_name);
>
> Historically, we've always reported the application name after
> STATE_UNDEFINED is set, never the reverse. There could be potential
> security implications if we were to begin reporting the application
> name before the connection has fully authenticated because it would be
> possible for attempted connections to show malicious data in the
> catalogs, and now we're sure that the catalog data is OK to query for
> other users. Let's be careful about that. I think that we should
> still set that as late as possible in pgstat_bestart_final(), never at
> an earlier step as this data can come from a connection
> string, potentially malicious if not authenticated yet.
I don't want to change the order of authentication and application
name, just move the application name report up above the state change
in pgstat_bestart_final(). I don't feel strongly about it, though.
> + if (!bootstrap)
> + {
> + pgstat_bestart_initial();
> + if (MyProcPort)
> + pgstat_bestart_security(); /* fill in any SSL/GSS info too */
>
> This part of patch 0001 is giving me a very long pause. What's the
> merit of filling in twice the backend entry data if we're going to
> update it at the end anyway for normal backend connections? More info
> for debugging, is it?
Correct; if you already know authentication information from the
transport then it seemed nice to fill it in before doing the stuff
that hangs.
> It seems that removing this call does not
> influence the tests you are adding, as well (or should the test
> "pg_stat_ssl view is updated" be responsible for checking that?).
The test is supposed to enforce that, but I see that it's not for some
reason. That's concerning. I'll investigate, thanks for pointing it
out.
> With this patch, the information that we get to be able to debug a
> backend entry in pg_stat_activity is st_clientaddr and
> remote_hostname. If we have a backend stuck in a wait event for the
> "Auth" class, would these be sufficient to provide a good user
> experience? Still better than nothing as we don't know the database
> and user ID used for the connection until authentication is done, I
> guess, to be able to grab patterns behind authentication getting
> stuck.
It's better than nothing, but is there a particular reason not to
trust the crypto the first time around? The SSL/GSS details are what
they are; if you don't trust them at either point one or point two
then I think we need to have a more urgent conversation about that?
> In patch 0002, WAIT_EVENT_SSPI_TRANSLATE_NAME is used twice for two
> code paths. Perhaps these should be two separate wait events?
That was my question from v6:
> I violated the "one event name per call site" rule with
> TranslateName(). The call pattern there is "call once to figure out
> the buffer length, then call again to fill it in", and IMO that didn't
> deserve differentiation. But if anyone objects, I'm happy to change it
> (and I'd appreciate some name suggestions in that case).
So -- any name suggestions? :D (Personally, I viewed this sort of like
an unrolled loop of two. I don't know if it helps to know which call
is hanging as long as you know that TranslateName is hanging.)
> The events attached to ldap_unbind() make that also kind of hard to
> debug. What's the code path actually impacted if we see them in
> pg_stat_activity? We have nine of them with the same wait event
> attached to them.
Do we _want_ nine separate flavors of WAIT_EVENT_LDAP_UNBIND? I
figured it was enough to know that you were stuck unbinding.
> The patch needs some 2024 -> 2025 updates in its copyright notices.
Will do.
> At the end of [1], I've been reminded of this quote from Andres about
> 0002:
> "This doesn't really seem like it's actually using wait events to
> describe waits. The new wait events cover stuff like memory
> allocations etc, see e.g. pg_SSPI_make_upn()."
>
> Should we do improvements in this area with interruptions before
> attempting to tackle the problem of making starting backend entries
> visible in pg_stat_activity, before these complete authentication? We
> use wait events for syscalls on I/O already, perhaps that does not
> stand as an argument in favor of this patch on its own, but I don't
> see why more events associated to external library calls as done in
> 0002 is a problem?
I had hoped that my v6 addressed Andres' concerns by pushing the
events down as far as possible (that way they can't be nested, which I
took to be the primary problem). Does something else need to be done?
Thanks,
--Jacob
From | Date | Subject | |
---|---|---|---|
Next Message | Ranier Vilela | 2025-02-10 16:31:20 | Small memory fixes for pg_createsubcriber |
Previous Message | David G. Johnston | 2025-02-10 16:21:48 | Re: describe special values in GUC descriptions more consistently |