From: | Krzysztof Olszewski <kolszew73(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Postgresql server gets stuck at low load |
Date: | 2020-06-05 10:07:02 |
Message-ID: | CAHihO3wAmb_b=uMT86PhJYk3-C-4_pNBqJ-TUehZVk82ifFFDw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
I have problem with one of my Postgres production server. Server works fine
almost always, but sometimes without any increase of transactions or
statements amount, machine gets stuck. Cores goes up to 100%, load up to
160%. When it happens then there are problems with connect to database and
even it will succeed, simple queries works several seconds instead of
milliseconds.Problem sometimes stops after a period a time (e.g. 35 min),
sometimes we must restart Postgres, Linux, or even KVM (which exists as
virtualization host).
My hardware
56 cores (Intel Core Processor (Skylake, IBRS))
400 GB RAM
RAID10 with about 40k IOPS
Os
CentOS Linux release 7.7.1908
kernel 3.10.0-1062.18.1.el7.x86_64
Databasesize 100 GB (entirely fit in memory :) )
server_version 10.12
effective_cache_size 192000 MB
maintenance_work_mem 2048 MB
max_connections 150
shared_buffers 64000 MB
work_mem 96 MB
On normal state, i have about 500 tps, 5% usage of cores, about 3% of load,
whole database fits in memory, no reads from disk, only writes on about 500
IOPS level, sometimes in spikes on 1500 IOPS level, but on this hardware
there is no problem with this values (no iowaits on cores). In normal state
this machine does "nothing". Connections to database are created by two app
servers based on Java, through connection pools, so connections count is
limited by configuration of pools and max is 120, is lower value than in
Postgres configuration (150). On normal state there is about 20
connections, when stuck goes into max (120).
In correlation with stucks i see informations in kernel log about
NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935]
but i don't know this is reason or effect of problem
I made investigation with pgBadger and ... nothing strange happens, just
normal statements
Any ideas?
Thanks,
Kris
From | Date | Subject | |
---|---|---|---|
Next Message | Oleksandr Shulgin | 2020-06-05 10:11:54 | Re: When to use PARTITION BY HASH? |
Previous Message | Philip Semanchuk | 2020-06-04 21:29:57 | Re: increased max_parallel_workers_per_gather results in fewer workers? |