Quick Links

Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Euler Taveira <euler(dot)taveira(at)enterprisedb(dot)com>
Subject:	Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
Date:	2024-11-07 22:56:01
Message-ID:	lgo4suf3r22vr47holahhokeoljk4sy2sdnoa6qqos2w6q3ule@ijwjocrqlish
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2024-11-07 14:29:18 -0800, Jacob Champion wrote:
> On Thu, Nov 7, 2024 at 1:37 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > - the pre-auth step must always initialize the entire pgstat struct
> >
> > Correct. And that has to happen exactly once, not twice.
>
> What goes wrong if it happens twice?

Primarily it's architecturally wrong. For no reason that I can see.

It does actually make things harder - what if somebody added a
pgstat_report_activity() somewhere between the call? It would suddenly get
lost after the second "initialization". Actually, the proposed patch already
has weird, externally visible, consequences - the application name is set,
then suddenly becomes unset, then is set again.

> > > - two-step initialization requires two PGSTAT_BEGIN_WRITE_ACTIVITY()
> > > calls for _every_ backend
> >
> > That's fine - PGSTAT_BEGIN_WRITE_ACTIVITY() is *extremely* cheap on the write
> > side. That's the whole point of of the sequence-lock like mechanism.
>
> Okay, cool. I'll retract that concern.

Additionally PGSTAT_BEGIN_WRITE_ACTIVITY() would already happen twice if you
initialize twice...

> > > - two-step initialization requires us to couple against the order that
> > > authentication information is being filled in (pre-auth, post-auth, or
> > > both)
> >
> > Not sure what you mean with this?
>
> In other words: if we split it, people who make changes to the order
> that authentication information is determined during startup must know
> to keep an eye on this code as well. Whereas with the current
> patchset, the layers are decoupled and the order doesn't matter.
> Quoting from above:
>
> Finally, if we're okay with all of that, future maintainers need to be
> careful about which fields get copied in the first (preauth) step, the
> second step, or both. GSS, for example, can be set up during transport
> negotiation (first step) or authentication (second step), so we have
> to duplicate the logic there. SSL is currently first-step-only, I
> think -- but are we sure we want to hardcode the assumption that cert
> auth can't change any of those parameters after the transport has been
> established? (I've been brainstorming ways we might use TLS 1.3's
> post-handshake CertificateRequest, for example.)

That doesn't seem like a reason to just initialize twice to me. If you have
one initialization step that properly initializes everything to a minimal
default state, you then can have granular functions that set up the user,
database, SSL, GSS information separately.

> > If you increase the iteration count for whatever secret
> > "hashing" method to be very high, it's not a wait, it's just CPU
> > use.
>
> I don't yet understand why this is a useful distinction to make. I
> understand that they are different, but what are the bad consequences
> if pg_stat_activity records a CPU busy wait in the same way it records
> an I/O wait -- as long as they're not nested?

Well, first, because you suddenly can't use wait events anymore to find waits.

But more importantly, because of not having nesting, adding one "coarse" "wait
event" means that anyone adding a wait event at a finer grade suddenly needs
to be aware of all the paths that could lead to the execution of the new
code and change all of them to not use the wait event anymore. It imposes a
tax on measuring actual "out of postgres" wait events.

> > My point is that you're redefining wait events to be "in some region of code"
> > and that once you start doing that, there's a lot of other places you could
> > suddenly use wait events.
> >
> > But wait events aren't actually suitable for that - they're a *single-depth*
> > mechanism, which means that if you start waiting, the prior wait event is
> > lost, and
> > a) the nested region isn't attributed to the parent while active
> > b) once the nested wait event is over, the parent isn't reset
>
> I understand that they shouldn't be nested. But as long as they're
> not, isn't that fine? And if the concern is that they'll accidentally
> get nested, whether in this patch or in the future, can't we just
> programmatically assert that we haven't?

One very useful wait event would be for memory allocations that hit the
kernel. Those can take a fairly long time, because they might need to write
dirty buffers to disk before there is enough free memory. Now imagine that we
redefine the system memory allocator (or just postgres') so that all memory
allocations from the kernel use a wait event. Now suddenly all that code that
uses "coarse" wait events suddenly has a *rare* path - because most of the time
we can carve memory out of a larger OS level memory allocation - where wait
events would be nested.

Greetings,

Andres Freund

In response to

Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible at 2024-11-07 22:29:18 from Jacob Champion

Responses

Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible at 2024-11-08 00:38:45 from Jacob Champion

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Masahiko Sawada	2024-11-07 23:13:32	Re: New "single" COPY format
Previous Message	Peter Geoghegan	2024-11-07 22:42:47	Re: Avoiding superfluous buffer locking during nbtree backwards scans