From: | Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Euler Taveira <euler(dot)taveira(at)enterprisedb(dot)com> |
Subject: | Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible |
Date: | 2024-11-07 18:44:25 |
Message-ID: | CAOYmi+kUkCkSV13-a0b0rwmtge_uZnPKQx-yLFQJdfoceWFH2g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Nov 7, 2024 at 10:12 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> I don't understand why the pgstat_bestart()/pgstat_bestart_pre_auth() split
> makes sense. The latter is going to redo most of the work that the former
> did. What's the point of that?
>
> Why not have a new function that initializes just the missing additional
> information? Or for that matter, why not move most of what pgstat_bestart()
> does into pgstat_beinit()?
I talk about that up above [1]. I agree that this is all complicated
and fragile, but at the moment, I think splitting things apart is not
going to reduce the complexity in any way. I'm all ears for a
different approach, though (and it sounds like Michael is taking a
stab at it too).
> This doesn't really seem like it's actually using wait events to describe
> waits. The new wait events cover stuff like memory allocations etc, see
> e.g. pg_SSPI_make_upn().
I've also asked about the "scope" of the waits in the OP [2]. I can
move them downwards in the stack, if you'd prefer.
All of these are intended to cover parts of the code that can actually
hang, but for things like SSPI I'm just working off of inspection and
Win32 documentation. So if it's not actually true that some of these
call points can hang, let me know and I can remove them. (For the
particular example you called out, I'm just trying to cover both calls
to TranslateName() in a maintainable place. The documentation says
"TranslateName fails if it cannot bind to Active Directory on a domain
controller." which seemed pretty wait-worthy to me.)
> This isn't just pedantry - all the relevant code really needs to be rewritten
> to allow the blocking to happen in an interruptible way, otherwise
> authentication timeout etc can't realiably work. Once that's done you can
> actually define useful wait events too.
I agree that would be amazing! I'm not about to tackle reliable
interrupts for all of the current blocking auth code for v18, though.
I'm just trying to make it observable when we do something that
blocks.
--Jacob
[1] https://www.postgresql.org/message-id/CAOYmi%2BkLzSWrDHZbJg8bWZ94oP_K98mkoEvetgupOBVoy5H_ag%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAOYmi%2B%3D60deN20WDyCoHCiecgivJxr%3D98s7s7-C8SkXwrCfHXg%40mail.gmail.com
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2024-11-07 19:00:03 | Re: magical eref alias names |
Previous Message | Tomas Vondra | 2024-11-07 18:39:55 | Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4 |