Need help identifying a periodic performance issue.

From: Robert Creager <robertc(at)spectralogic(dot)com>
To: "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Need help identifying a periodic performance issue.
Date: 2021-11-16 04:43:25
Message-ID: 3A76761B-3DDD-4A3C-B791-E73107AFC9A2@spectralogic.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Hello all,

A description of what you are trying to achieve and what results you expect.:

We’re executing the following copy to fill a table with approximately 5k records, then repeating for a total of 250k records. Normally, this copy executes < 1 second, with the entire set taking a couple of minutes. The problem is not reproducible on command, but usually within a couple of hours of starting some test runs.

COPY ds3.blob (byte_offset, checksum, checksum_type, id, length, object_id) FROM STDIN WITH DELIMITER AS ‘|’

But, occasionally we get into a huge performance bottleneck for about 2 hours, where these copy operations are taking 140 seconds or so

Nov 15 22:25:49 sm4u-34 postgres[5799]: [381-1] db=tapesystem,user=Administrator,app=PostgreSQL JDBC Driver,client=127.0.0.1 LOG: duration: 145326.293 ms statement: COPY ds3.blob (byte_offset, checksum, checksum_type, id, length, object_id) FROM STDIN WITH DELIMITER AS '|'

One CPU is pegged, the data has been sent over STDIN, so Postgres is not waiting for more, there are no other queries running using this select:

SELECT pid,
client_port,
now() - query_start AS "runtime",
query_start,
datname,
state,
wait_event_type,
query,
usename
FROM pg_stat_activity
WHERE query !~ 'pg_stat_activity' AND
state != 'idle'
ORDER BY state, runtime DESC;

# pid,client_port,runtime,query_start,datname,state,wait_event_type,query,usename
5799,27136,0 years 0 mons 0 days 0 hours 2 mins 15.534339 secs,2021-11-15 22:23:23.932988 +00:00,tapesystem,active,,"COPY ds3.blob (byte_offset, checksum, checksum_type, id, length, object_id) FROM STDIN WITH DELIMITER AS '|'",Administrator

I’m logging statements with pgbadger monitoring the logs. There are no apparent auto-vacuum’s running, nor any vacuums, nor anything at all really. Other select queries around that time frame are executing normally.

We’re coming from PostgreSQL 9.6 on FreeBSD 11 where we did not see this problem, but have a major release upgrade happening. I’m checking to see if this machine was updated or was a fresh install.

From pbgadger:

[cid:26C88B9B-2CBD-4C70-B07B-3847B2E57C4E]

PostgreSQL version number you are running:

PostgreSQL 13.2 on amd64-portbld-freebsd13.0, compiled by FreeBSD clang version 11.0.1 (git(at)github(dot)com<mailto:git(at)github(dot)com>:llvm/llvm-project.git llvmorg-11.0.1-0-g43ff75f2c3fe), 64-bit

How you installed PostgreSQL:

Ports tree, compiled from source.

Changes made to the settings in the postgresql.conf file

name | current_setting | source
---------------------------------+---------------------------------+--------------------
application_name | psql | client
autovacuum_analyze_scale_factor | 0.05 | configuration file
autovacuum_analyze_threshold | 5000 | configuration file
autovacuum_max_workers | 8 | configuration file
autovacuum_vacuum_cost_delay | 5ms | configuration file
autovacuum_vacuum_scale_factor | 0.1 | configuration file
autovacuum_vacuum_threshold | 5000 | configuration file
checkpoint_completion_target | 0.9 | configuration file
checkpoint_timeout | 30min | configuration file
checkpoint_warning | 5min | configuration file
client_encoding | UTF8 | client
commit_delay | 1000 | configuration file
DateStyle | ISO, MDY | configuration file
default_text_search_config | pg_catalog.english | configuration file
dynamic_shared_memory_type | posix | configuration file
effective_cache_size | 58076MB | configuration file
effective_io_concurrency | 200 | configuration file
full_page_writes | off | configuration file
hot_standby | off | configuration file
lc_messages | C | configuration file
lc_monetary | C | configuration file
lc_numeric | C | configuration file
lc_time | C | configuration file
listen_addresses | * | configuration file
log_autovacuum_min_duration | 1s | configuration file
log_checkpoints | on | configuration file
log_connections | on | configuration file
log_destination | syslog | configuration file
log_disconnections | on | configuration file
log_duration | off | configuration file
log_line_prefix | db=%d,user=%u,app=%a,client=%h | configuration file
log_lock_waits | on | configuration file
log_min_duration_sample | 100ms | configuration file
log_min_duration_statement | 1ms | configuration file
log_statement_sample_rate | 0.01 | configuration file
log_temp_files | 0 | configuration file
log_timezone | UTC | configuration file
maintenance_work_mem | 3927MB | configuration file
max_connections | 250 | configuration file
max_parallel_workers_per_gather | 8 | configuration file
max_replication_slots | 0 | configuration file
max_stack_depth | 32MB | configuration file
max_wal_senders | 0 | configuration file
max_wal_size | 50GB | configuration file
max_worker_processes | 8 | configuration file
random_page_cost | 2 | configuration file
shared_buffers | 21679MB | configuration file
synchronous_commit | off | configuration file
temp_buffers | 1309MB | configuration file
TimeZone | UTC | configuration file
track_activities | on | configuration file
track_counts | on | configuration file
update_process_title | off | configuration file
vacuum_cost_delay | 1ms | configuration file
wal_init_zero | off | configuration file
wal_level | minimal | configuration file
wal_recycle | off | configuration file
wal_skip_threshold | 20MB | configuration file
wal_sync_method | fsync | configuration file
wal_writer_delay | 500ms | configuration file
wal_writer_flush_after | 10MB | configuration file
work_mem | 1309MB | configuration file

Operating system and version:

FreeBSD sm4u-34 13.0-STABLE FreeBSD 13.0-STABLE #0: Mon Sep 13 10:11:57 MDT 2021

What program you're using to connect to PostgreSQL:

In house Java 1.8 program, JDBC 42.2.20

Is there anything relevant or unusual in the PostgreSQL server logs?:

Nope

Other info

tapesystem=# SELECT * FROM pg_config();
name | setting
-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
BINDIR | /usr/local/bin
DOCDIR | /usr/local/share/doc/postgresql
HTMLDIR | /usr/local/share/doc/postgresql
INCLUDEDIR | /usr/local/include
PKGINCLUDEDIR | /usr/local/include/postgresql
INCLUDEDIR-SERVER | /usr/local/include/postgresql/server
LIBDIR | /usr/local/lib
PKGLIBDIR | /usr/local/lib/postgresql
LOCALEDIR | /usr/local/share/locale
MANDIR | /usr/local/man
SHAREDIR | /usr/local/share/postgresql
SYSCONFDIR | /usr/local/etc/postgresql
PGXS | /usr/local/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE | '--with-libraries=/usr/local/lib' '--with-includes=/usr/local/include' '--enable-thread-safety' '--with-icu' '--disable-debug' '--disable-dtrace' '--without-gssapi' '--without-ldap' '--disable-nls' '--without-pam' '--with-openssl' '--with-system-tzdata=/usr/share/zoneinfo' '--without-libxml' '--with-llvm' '--prefix=/usr/local' '--localstatedir=/var' '--mandir=/usr/local/man' '--infodir=/usr/local/share/info/' '--build=amd64-portbld-freebsd13.0' 'build_alias=amd64-portbld-freebsd13.0' 'CC=cc' 'CFLAGS=-O2 -pipe -fstack-protector-strong -fno-strict-aliasing ' 'LDFLAGS= -L/usr/local/lib -lpthread -L/usr/local/lib -fstack-protector-strong ' 'LIBS=' 'CPPFLAGS=-I/usr/local/include' 'CXX=c++' 'CXXFLAGS=-O2 -pipe -fstack-protector-strong -fno-strict-aliasing ' 'LLVM_CONFIG=/usr/local/bin/llvm-config11' 'CPP=cpp' 'PKG_CONFIG=pkgconf' 'LDFLAGS_SL='
CC | cc
CPPFLAGS | -I/usr/local/include -I/usr/local/include -I/usr/local/include -I/usr/local/include
CFLAGS | -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -O2 -pipe -fstack-protector-strong -fno-strict-aliasing
CFLAGS_SL | -fPIC -DPIC
LDFLAGS | -L/usr/local/lib -lpthread -L/usr/local/lib -fstack-protector-strong -L/usr/local/llvm11/lib -L/usr/local/lib -Wl,--as-needed -Wl,-R'/usr/local/lib'
LDFLAGS_EX |
LDFLAGS_SL |
LIBS | -lpgcommon -lpgport -lssl -lcrypto -lz -lreadline -lexecinfo -lm
VERSION | PostgreSQL 13.2

hw.model: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
hw.machine: amd64
hw.ncpu: 24

hw.physmem: 137287901184

2 mirrored ZFS SSD disks. Just for fun, we briefly broke the mirror during one of these slowdowns, nothing changed.

These are the system calls made over 30 seconds from Postgres during a slowdown.

getrusage 1
access 3
exit 3
fork 3
getrandom 3
pipe2 3
procctl 3
setsid 3
thr_self 3
__sysctl 5
mmap 6
wait4 6
kill 11
select 14
sigreturn 14
rename 18
getpid 21
fsync 27
pwrite 27
openat 28
sigaction 33
write 50
fstat 56
open 56
sigprocmask 98
read 143
getppid 163
kqueue 163
fcntl 175
close 249
sendto 629
kevent 1069
recvfrom 1192
lseek 3604
fstatat 15894

Thanks for any assistance.

Best,
Robert

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2021-11-16 05:29:38 Re: Need help identifying a periodic performance issue.
Previous Message Tom Lane 2021-11-15 15:25:31 Re: Lock contention high