Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Euler Taveira <euler(dot)taveira(at)enterprisedb(dot)com>
Subject: Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
Date: 2024-11-07 21:36:59
Message-ID: huirba5tv7kyztyfduionaucesah6ct54lwzemeoemft7hz5vj@2k45xqxtovl7
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2024-11-07 12:11:46 -0800, Jacob Champion wrote:
> On Thu, Nov 7, 2024 at 11:41 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I think the patch should not be merged as-is. It's just too ugly and fragile.
>
> Understood; I'm trying to find a way forward, and I'm pointing out
> that the alternatives I've tried seem to me to be _more_ fragile.
>
> Are there any items in this list that you disagree with/are less
> concerned about?
>
> - the pre-auth step must always initialize the entire pgstat struct

Correct. And that has to happen exactly once, not twice.

> - two-step initialization requires two PGSTAT_BEGIN_WRITE_ACTIVITY()
> calls for _every_ backend

That's fine - PGSTAT_BEGIN_WRITE_ACTIVITY() is *extremely* cheap on the write
side. That's the whole point of of the sequence-lock like mechanism.

> - two-step initialization requires us to couple against the order that
> authentication information is being filled in (pre-auth, post-auth, or
> both)

Not sure what you mean with this?

> > I think it might make more sense to use pgstat_report_activity() or such here,
> > rather than using wait events to describe something that's not a wait.
>
> I'm not sure why you're saying these aren't waits. If pam_authenticate
> is capable of hanging for hours and bringing down a production system,
> is that not a "wait"?

It may or may not be. If you increase the iteration count for whatever secret
"hashing" method to be very high, it's not a wait, it's just CPU
use. Similarly, if you have a cpu expensive WHERE clause, that's not a
wait. But if you wait for network IO due to pam using ldap underneath or you
need to read toast values from disk, those are waits.

> > > I agree that would be amazing! I'm not about to tackle reliable
> > > interrupts for all of the current blocking auth code for v18, though.
> > > I'm just trying to make it observable when we do something that
> > > blocks.
> >
> > Well, with that justification we could end up adding wait events for large
> > swaths of code that might not actually ever wait.
>
> If it were hypothetically useful to do so, would that be a problem?
> I'm trying not to propose things that aren't actually useful.

My point is that you're redefining wait events to be "in some region of code"
and that once you start doing that, there's a lot of other places you could
suddenly use wait events.

But wait events aren't actually suitable for that - they're a *single-depth*
mechanism, which means that if you start waiting, the prior wait event is
lost, and
a) the nested region isn't attributed to the parent while active
b) once the nested wait event is over, the parent isn't reset

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-11-07 21:38:09 Re: Commit Timestamp and LSN Inversion issue
Previous Message Nathan Bossart 2024-11-07 21:36:38 Re: Use __attribute__((target(sse4.2))) for SSE42 CRC32C