libpq contention due to gss even when not using gss

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Robbie Harwood <rharwood(at)redhat(dot)com>
Subject: libpq contention due to gss even when not using gss
Date: 2024-06-10 18:12:12
Message-ID: 20240610181212.auytluwmbfl7lb5n@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

To investigate a report of both postgres and pgbouncer having issues when a
lot of new connections aree established, I used pgbench -C. Oddly, on an
early attempt, the bottleneck wasn't postgres+pgbouncer, it was pgbench. But
only when using TCP, not with unix sockets.

c=40;pgbench -C -n -c$c -j$c -T5 -f <(echo 'select 1') 'port=6432 host=127.0.0.1 user=test dbname=postgres password=fake'

host=127.0.0.1: 16465
host=127.0.0.1,gssencmode=disable 20860
host=/tmp: 49286

Note that the server does *not* support gss, yet gss has a substantial
performance impact.

Obviously the connection rates here absurdly high and outside of badly written
applications likely never practically relevant. However, the number of cores
in systems are going up, and this quite possibly will become relevant in more
realistic scenarios (lock contention kicks in earlier the more cores you
have).

And it doesn't seem great that something as rarely used as gss introduces
overhead to very common paths.

Here's a bottom-up profile:

- 32.10% pgbench [kernel.kallsyms] [k] queued_spin_lock_slowpath
- 32.09% queued_spin_lock_slowpath
- 16.15% futex_wake
do_futex
__x64_sys_futex
do_syscall_64
- entry_SYSCALL_64_after_hwframe
- 16.15% __GI___lll_lock_wake
- __GI___pthread_mutex_unlock_usercnt
- 5.12% gssint_select_mech_type
- 4.36% gss_inquire_attrs_for_mech
- 2.85% gss_indicate_mechs
- gss_indicate_mechs_by_attrs
- 1.58% gss_acquire_cred_from
gss_acquire_cred
pg_GSS_have_cred_cache
select_next_encryption_method (inlined)
init_allowed_encryption_methods (inlined)
PQconnectPoll
pqConnectDBStart (inlined)
PQconnectStartParams
PQconnectdbParams
doConnect

And a bottom-up profile:

- 32.10% pgbench [kernel.kallsyms] [k] queued_spin_lock_slowpath
- 32.09% queued_spin_lock_slowpath
- 16.15% futex_wake
do_futex
__x64_sys_futex
do_syscall_64
- entry_SYSCALL_64_after_hwframe
- 16.15% __GI___lll_lock_wake
- __GI___pthread_mutex_unlock_usercnt
- 5.12% gssint_select_mech_type
- 4.36% gss_inquire_attrs_for_mech
- 2.85% gss_indicate_mechs
- gss_indicate_mechs_by_attrs
- 1.58% gss_acquire_cred_from
gss_acquire_cred
pg_GSS_have_cred_cache
select_next_encryption_method (inlined)
init_allowed_encryption_methods (inlined)
PQconnectPoll
pqConnectDBStart (inlined)
PQconnectStartParams
PQconnectdbParams
doConnect

Clearly the contention originates outside of our code, but is triggered by
doing pg_GSS_have_cred_cache() every time a connection is established.

Greetings,

Andres Freund

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-06-10 18:30:31 Re: Remove dependence on integer wrapping
Previous Message Bertrand Drouvot 2024-06-10 17:48:22 Re: Track the amount of time waiting due to cost_delay