Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Euler Taveira <euler(dot)taveira(at)enterprisedb(dot)com>
Subject: Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
Date: 2024-09-10 22:33:31
Message-ID: ZuDJO0C-lhCH_UfF@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 10, 2024 at 01:58:50PM -0700, Noah Misch wrote:
> On Tue, Sep 10, 2024 at 02:51:23PM -0400, Robert Haas wrote:
>> I think unique names are a good idea. If a user doesn't care about the
>> difference between sdgjsA and sdjgsB, they can easily ignore the
>> trailing suffix, and IME, people typically do that without really
>> stopping to think about it. If on the other hand the two are lumped
>> together as sdjgs and a user needs to distinguish them, they can't. So
>> I see unique names as having much more upside than downside.
>
> I agree a person can ignore the distinction, but that requires the person to
> be consuming the raw event list. It's reasonable to tell your monitoring tool
> to give you the top N wait events. Individual AuthnLdap* events may all miss
> the cut even though their aggregate would have made the cut. Before you know
> to teach that monitoring tool to group AuthnLdap* together, it won't show you
> any of those names.

That's a fair point. I use a bunch of aggregates with group bys for
any monitoring queries looking for event point patterns. In my
experience, when dealing with enough connections, patterns show up
anyway even if there is noise because some of the events that I was
looking for are rather short-term, like a sync events interleaving
with locks storing an average of the events into a secondary table
with some INSERT SELECT.

> I felt commit c789f0f also chose sub-optimally in this respect, particularly
> with the DblinkGetConnect/DblinkConnect pair. I didn't feel strongly enough
> to complain at the time, but a rule of "each wait event appears in one
> pgstat_report_wait_start()" would be a rule I don't want. One needs
> familiarity with the dblink implementation internals to grasp the
> DblinkGetConnect/DblinkConnect distinction, and a plausible refactor of dblink
> would make those names cease to fit. I see this level of fine-grained naming
> as making the event name a sort of stable proxy for FILE:LINE. I'd value
> exposing such a proxy, all else being equal, but I don't think wait event
> names like AuthLdapBindLdapbinddn/AuthLdapBindUser are the right way. Wait
> event names should be more independent of today's code-level details.

Depends. I'd rather choose more granularity to know exactly which
part of the code I am dealing with, especially in the case of this
thread where these are embedded around external function calls. If,
for example, one notices that a stack of pg_stat_activity scans are
complaining about a specific step in the authentication process, it is
going to offer a much better hint than having to guess which part of
the authentication step is slow, like in LDAP.

Wait event additions are also kind of cheap in terms of maintenance in
core, creating a new translation cost. So I also think there are more
upsides to be wilder here with more points and more granularity.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-09-10 22:35:35 Re: Refactoring postmaster's code to cleanup after child exit
Previous Message Michael Paquier 2024-09-10 22:23:09 Re: Retire support for OpenSSL 1.1.1 due to raised API requirements