Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)

From: Adam Sjøgren <asjo(at)koldfront(dot)dk>
To: pgsql-general(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)
Date: 2020-08-19 13:18:57
Message-ID: 87tuwyojsu.fsf@tullinup.koldfront.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom writes:

> Stephen Frost <sfrost(at)snowman(dot)net> writes:
>> * Adam Sjøgren (asjo(at)koldfront(dot)dk) wrote:
>>> Sometimes new database logins slow down, from usually taking <0.05s to
>>> taking minutes. This is for psql as a normal user using Kerberos, for
>>> psql as the postgres superuser, for the web-application logging into the
>>> database, for everything.
>
> I think looking into pg_stat_activity and pg_locks might be interesting,
> although it's possible the delay is before the new session has made
> any entries there.
>
> Also, obtaining stack traces from a few of the stuck processes would
> likely be quite informative.

I will try to look at pg_stat_activity and pg_locks the next time - it
just happened now and I focused on getting some stacktraces -
unfortunately I do not have the -dbg package installed, so I don't know
how useful they are.

Here's the first one:

$ ps faux | grep postgres | grep authentication
postgres 16697 0.0 0.0 34686088 8352 ? Ss 14:45 0:00 \_ postgres: 11/main: workqueue XXdb 127.0.0.1(42534) authentication
postgres 16701 0.0 0.0 34686088 6476 ? Ss 14:45 0:00 \_ postgres: 11/main: anonymous XXdb 127.0.0.1(42536) authentication
postgres 16702 0.0 0.0 34686088 8348 ? Ss 14:45 0:00 \_ postgres: 11/main: svc_user XXdb 127.0.0.1(42540) authentication
postgres 16704 0.0 0.0 34686040 5376 ? Ss 14:45 0:00 \_ postgres: 11/main: postgres postgres [local] authentication
postgres 16705 0.0 0.0 34686088 8352 ? Ss 14:45 0:00 \_ postgres: 11/main: svc_user XXdb 127.0.0.1(42542) authentication
postgres 16708 0.0 0.0 34686088 6476 ? Ss 14:45 0:00 \_ postgres: 11/main: svc_user XXdb 127.0.0.1(42544) authentication
$ sudo gdb -p 16708
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
[...]
Attaching to process 16708
Reading symbols from /usr/lib/postgresql/11/bin/postgres...(no debugging symbols found)...done.
[...]
0x00007f8270a039f3 in delete_module () at ../sysdeps/unix/syscall-template.S:86
86 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 0x00007f8270a039f3 in delete_module () at ../sysdeps/unix/syscall-template.S:86
#1 0x000055f60d3d0101 in WaitEventSetWait ()
#2 0x000055f60d2dccff in secure_read ()
#3 0x000055f60d2e9138 in ?? ()
#4 0x000055f60d2e9e75 in pq_getbyte ()
#5 0x000055f60d3f3100 in PostgresMain ()
#6 0x000055f60d102bf8 in ?? ()
#7 0x000055f60d37e492 in PostmasterMain ()
#8 0x000055f60d103fa5 in main ()
(gdb)

I think I was too slow on this one, and this is what it looks like if
the process is just waiting for a connection (as you can guess, I'm not
used to looking at backtraces).

Here is another one, which hopefully is caught at a better moment, and
more useful:

postgres 19495 0.0 0.0 34686088 8412 ? Ss 14:48 0:00 \_ postgres: 11/main: workqueue XXdb 127.0.0.1(43390) authentication
postgres 19497 0.0 0.0 34686088 6476 ? Ss 14:48 0:00 \_ postgres: 11/main: XXX XXdb 127.0.0.1(43392) authentication
postgres 19502 0.0 0.0 34686088 8412 ? Ss 14:48 0:00 \_ postgres: 11/main: XXX XXdb 127.0.0.1(43394) authentication
$ sudo gdb -p 19495
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
[...]
Attaching to process 19495
[...]
0x00007f8272cf2827 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7f7a2e9f1d38) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
205 ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
(gdb) bt
#0 0x00007f8272cf2827 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7f7a2e9f1d38) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1 do_futex_wait (sem=sem(at)entry=0x7f7a2e9f1d38, abstime=0x0) at sem_waitcommon.c:111
#2 0x00007f8272cf28d4 in __new_sem_wait_slow (sem=0x7f7a2e9f1d38, abstime=0x0) at sem_waitcommon.c:181
#3 0x00007f8272cf297a in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29
#4 0x000055f60d36c252 in PGSemaphoreLock ()
#5 0x000055f60d3e35cc in LWLockAcquire ()
#6 0x000055f60d3d14ee in ProcArrayAdd ()
#7 0x000055f60d3e01a3 in InitProcessPhase2 ()
#8 0x000055f60d51c99b in InitPostgres ()
#9 0x000055f60d3f2baf in PostgresMain ()
#10 0x000055f60d102bf8 in ?? ()
#11 0x000055f60d37e492 in PostmasterMain ()
#12 0x000055f60d103fa5 in main ()
(gdb) cont
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00007f8272cf2827 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7f7a2e9f1d38) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
205 in ../sysdeps/unix/sysv/linux/futex-internal.h
(gdb) bt
#0 0x00007f8272cf2827 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7f7a2e9f1d38) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1 do_futex_wait (sem=sem(at)entry=0x7f7a2e9f1d38, abstime=0x0) at sem_waitcommon.c:111
#2 0x00007f8272cf28d4 in __new_sem_wait_slow (sem=0x7f7a2e9f1d38, abstime=0x0) at sem_waitcommon.c:181
#3 0x00007f8272cf297a in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29
#4 0x000055f60d36c252 in PGSemaphoreLock ()
#5 0x000055f60d3e35cc in LWLockAcquire ()
#6 0x000055f60d3d14ee in ProcArrayAdd ()
#7 0x000055f60d3e01a3 in InitProcessPhase2 ()
#8 0x000055f60d51c99b in InitPostgres ()
#9 0x000055f60d3f2baf in PostgresMain ()
#10 0x000055f60d102bf8 in ?? ()
#11 0x000055f60d37e492 in PostmasterMain ()
#12 0x000055f60d103fa5 in main ()
(gdb) cont
Continuing.
(gdb) bt

Program received signal SIGUSR1, User defined signal 1.
0x00007f8270a039f3 in delete_module () at ../sysdeps/unix/syscall-template.S:86
86 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 0x00007f8270a039f3 in delete_module () at ../sysdeps/unix/syscall-template.S:86
#1 0x000055f60d3d0101 in WaitEventSetWait ()
#2 0x000055f60d3d0574 in WaitLatchOrSocket ()
#3 0x000055f60d3e0b41 in ProcSleep ()
#4 0x000055f60d3db249 in ?? ()
#5 0x000055f60d3dc591 in LockAcquireExtended ()
#6 0x000055f60d3da365 in LockTuple ()
#7 0x000055f60d144ea6 in heap_update ()
#8 0x000055f60d2c59ff in ?? ()
#9 0x000055f60d2c609b in ?? ()
#10 0x000055f60d2a0d8b in standard_ExecutorRun ()
#11 0x00007f8269670e45 in ?? () from /usr/lib/postgresql/11/lib/pg_stat_statements.so
---Type <return> to continue, or q <return> to quit---
#12 0x000055f60d3f5cd5 in ?? ()
#13 0x000055f60d3f5f28 in ?? ()
#14 0x000055f60d3f6be5 in PortalRun ()
#15 0x000055f60d3f26a1 in ?? ()
#16 0x000055f60d3f3bcb in PostgresMain ()
#17 0x000055f60d102bf8 in ?? ()
#18 0x000055f60d37e492 in PostmasterMain ()
#19 0x000055f60d103fa5 in main ()
(gdb) cont
Continuing.
(gdb) bt
(gdb) cont
(gdb) cont
^C
Program received signal SIGINT, Interrupt.
0x00007f8270a039f3 in delete_module () at ../sysdeps/unix/syscall-template.S:86
86 in ../sysdeps/unix/syscall-template.S
(gdb) bt
#0 0x00007f8270a039f3 in delete_module () at ../sysdeps/unix/syscall-template.S:86
#1 0x000055f60d3d0101 in WaitEventSetWait ()
#2 0x000055f60d2dccff in secure_read ()
#3 0x000055f60d2e9138 in ?? ()
#4 0x000055f60d2e9e75 in pq_getbyte ()
#5 0x000055f60d3f3100 in PostgresMain ()
#6 0x000055f60d102bf8 in ?? ()
#7 0x000055f60d37e492 in PostmasterMain ()
#8 0x000055f60d103fa5 in main ()
(gdb) cont
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00007f8270a039f3 in delete_module () at ../sysdeps/unix/syscall-template.S:86
86 in ../sysdeps/unix/syscall-template.S
(gdb) bt
#0 0x00007f8270a039f3 in delete_module () at ../sysdeps/unix/syscall-template.S:86
#1 0x000055f60d3d0101 in WaitEventSetWait ()
#2 0x000055f60d2dccff in secure_read ()
#3 0x000055f60d2e9138 in ?? ()
#4 0x000055f60d2e9e75 in pq_getbyte ()
#5 0x000055f60d3f3100 in PostgresMain ()
#6 0x000055f60d102bf8 in ?? ()
#7 0x000055f60d37e492 in PostmasterMain ()
#8 0x000055f60d103fa5 in main ()
(gdb) quit
A debugging session is active.

Inferior 1 [process 19495] will be detached.

Quit anyway? (y or n) n
Not confirmed.
(gdb) detach
Detaching from program: /usr/lib/postgresql/11/bin/postgres, process 19495
(gdb) quit

Is this helpful, or should I do something different?

Here is a final one:

postgres 21257 0.0 0.0 34686088 6476 ? Ss 14:52 0:00 \_ postgres: 11/main: svc_user XXdb 127.0.0.1(44192) authentication
$ sudo gdb -p 21257
[...]
0x00007f8272cf2827 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7f7a2ea001b8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
205 ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
(gdb) bt
#0 0x00007f8272cf2827 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7f7a2ea001b8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1 do_futex_wait (sem=sem(at)entry=0x7f7a2ea001b8, abstime=0x0) at sem_waitcommon.c:111
#2 0x00007f8272cf28d4 in __new_sem_wait_slow (sem=0x7f7a2ea001b8, abstime=0x0) at sem_waitcommon.c:181
#3 0x00007f8272cf297a in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29
#4 0x000055f60d36c252 in PGSemaphoreLock ()
#5 0x000055f60d3e35cc in LWLockAcquire ()
#6 0x000055f60d3d14ee in ProcArrayAdd ()
#7 0x000055f60d3e01a3 in InitProcessPhase2 ()
#8 0x000055f60d51c99b in InitPostgres ()
#9 0x000055f60d3f2baf in PostgresMain ()
#10 0x000055f60d102bf8 in ?? ()
#11 0x000055f60d37e492 in PostmasterMain ()
#12 0x000055f60d103fa5 in main ()
(gdb) cont
Continuing.
bt

Program received signal SIGUSR1, User defined signal 1.
0x000055f60d10e50e in slot_getsomeattrs ()
(gdb) bt
#0 0x000055f60d10e50e in slot_getsomeattrs ()
#1 0x000055f60d29d260 in ?? ()
#2 0x000055f60d2a9965 in ExecScan ()
#3 0x000055f60d2b7b2a in ?? ()
#4 0x000055f60d2c71e6 in ?? ()
#5 0x000055f60d2a0d8b in standard_ExecutorRun ()
#6 0x00007f8269670e45 in ?? () from /usr/lib/postgresql/11/lib/pg_stat_statements.so
#7 0x000055f60d3f571b in ?? ()
#8 0x000055f60d3f6d40 in PortalRun ()
#9 0x000055f60d3f26a1 in ?? ()
#10 0x000055f60d3f3bcb in PostgresMain ()
#11 0x000055f60d102bf8 in ?? ()
#12 0x000055f60d37e492 in PostmasterMain ()
#13 0x000055f60d103fa5 in main ()
(gdb) cont
Continuing.

Program received signal SIGUSR1, User defined signal 1.
0x000055f60d10e50e in slot_getsomeattrs ()
(gdb) bt
#0 0x000055f60d10e50e in slot_getsomeattrs ()
#1 0x000055f60d29d260 in ?? ()
#2 0x000055f60d2a9965 in ExecScan ()
#3 0x000055f60d2b7b2a in ?? ()
#4 0x000055f60d2c71e6 in ?? ()
#5 0x000055f60d2a0d8b in standard_ExecutorRun ()
#6 0x00007f8269670e45 in ?? () from /usr/lib/postgresql/11/lib/pg_stat_statements.so
#7 0x000055f60d3f571b in ?? ()
#8 0x000055f60d3f6d40 in PortalRun ()
#9 0x000055f60d3f26a1 in ?? ()
#10 0x000055f60d3f3bcb in PostgresMain ()
#11 0x000055f60d102bf8 in ?? ()
#12 0x000055f60d37e492 in PostmasterMain ()
#13 0x000055f60d103fa5 in main ()
(gdb) cont
Continuing.
(gdb) bt
[Inferior 1 (process 21257) exited normally]
(gdb) bt
No stack.
(gdb) cont
The program is not being run.
(gdb) quit

Let me know if there is something else I should try.

Best regards,

Adam

--
"It is a sort of cheap and cheerful kind of Adam Sjøgren
abstraction, but it works well in practise." asjo(at)koldfront(dot)dk

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Laura Smith 2020-08-19 13:38:51 Inline count on a query
Previous Message Jehan-Guillaume de Rorthais 2020-08-19 13:07:13 Re: BUG? Slave don't reconnect to the master