Quick Links

Re: PG 9.6 Slow inserts with long-lasting LWLocks

From:	MichaelDBA <MichaelDBA(at)sqlexec(dot)com>
To:	Pavel Suderevsky <psuderevsky(at)gmail(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: PG 9.6 Slow inserts with long-lasting LWLocks
Date:	2018-03-16 13:42:34
Message-ID:	5AABC9CA.1040709@sqlexec.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Sporadic insert slowness could be due to lock delays (locktype=extend)
due to many concurrent connections trying to insert into the same table
at the same time. Each insert request may result in an extend lock (8k
extension), which blocks other writers. What normally happens is the
these extend locks happen so fast that you hardly ever see them in the
pg_locks table, except in the case where many concurrent connections are
trying to do inserts into the same table. The following query will show
if this is the case if you execute it during the time the problem is
occurring.

select * from pg_locks where granted = false and locktype = 'extend';

I don't know if this is your particular problem, but perhaps it is.

Regards,
Michael Vitale
> Pavel Suderevsky <mailto:psuderevsky(at)gmail(dot)com>
> Thursday, March 15, 2018 6:29 AM
>
> Hi,
>
> Well, unfortunately I still need community help.
>
> -- Environment
>
> OS: Centos CentOS Linux release 7.2.1511
>
> Kernel: 3.10.0-327.36.3.el7.x86_64
>
> PostgreSQL: 9.6.3
>
> -- Hardware
>
> Server: Dell PowerEdge R430
>
> CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
>
> Raid controller: PERC H730 Mini (1GB cache)
>
> Disks: 8 x 10K RPM SAS 12GB/s 2.5 (ST1200MM0088) in RAID 6
>
> RAM: 192GB (M393A2G40DB0-CPB x 16)
>
> For more detailed hardware info please see attached configuration.txt
>
> -- postgresql.conf
>
> max_connections = 2048
>
> shared_buffers = 48GB
>
> temp_buffers = 128MB
>
> work_mem = 256MB
>
> maintenance_work_mem = 512MB
>
> dynamic_shared_memory_type = posix
>
> wal_level = hot_standby
>
> min_wal_size = 4GB
>
> max_wal_size = 32GB
>
> huge_pages = on
>
> +
>
> numactl interleave=all
>
> -- sysctl.conf
>
> kernel.shmmax=64424509440
>
> kernel.shmall=4294967296
>
> kernel.sem = 1024 32767 128 16384
>
> fs.aio-max-nr=3145728
>
> fs.file-max = 6815744
>
> net.core.rmem_default=262144
>
> net.core.rmem_max=4194304
>
> net.core.wmem_default=262144
>
> net.core.wmem_max=1048586
>
> vm.nr_hugepages=33000
>
> vm.dirty_background_bytes=67108864
>
> vm.dirty_bytes=536870912
>
> vm.min_free_kbytes=1048576
>
> zone_reclaim_mode=0
>
> Again: problem is the occasional long inserts that can happen 1-5
> times per day on OLTP system.
>
> No autovacuum performed during long inserts. WAL rate is 1-2Gb per
> hour, no correlation spotted with this issue.
>
> Wait event "buffer_mapping" happen for appropriate transactions but
> not every time (maybe just not every time catched).
>
> I have two suspects for such behaviour: I/O system and high concurrency.
>
> There is a problem with one application that frequently recreates up
> to 90 sessions but investigation shows that there is no direct
> correlation between such sessions and long transactions, at least it
> is not the root cause of the issue (of course such app behaviour will
> be fixed).
>
> The investigation and tracing with strace in particular showed that:
>
> 1. The only long event straced from postgres backends was <... semop
> resumed>.
>
> 2. Seems the whole host gets hung during such events.
>
> Example:
>
> Java application located on separate host reports several long
> transactions:
>
> 123336.943 - [1239588mks]: event.insert-table
>
> 123336.943 - [1240827mks]: event.insert-table
>
> 123337.019 - [1292534mks]: event.insert-table
>
> 143353.542 - [5467657mks]: event.insert-table
>
> 143353.543 - [5468884mks]: event.insert-table
>
> 152338.763 - [1264588mks]: event.insert-table
>
> 152338.765 - [2054887mks]: event.insert-table
>
> Strace output for event happened at 14:33 with particular known pid:
>
> 119971 14:33:48.075375 epoll_wait(3, <unfinished ...>
>
> 119971 14:33:48.075696 <... epoll_wait resumed> {{EPOLLIN,
> {u32=27532016, u64=27532016}}}, 1, -1) = 1 <0.000313>
>
> 119971 14:33:48.075792 recvfrom(9, <unfinished ...>
>
> 119971 14:33:48.075866 <... recvfrom resumed>
> "B\0\0\3\27\0S_21\0\0*\0\1\0\1\0\1\0\0\0\0\0\1\0\1\0\0\0\0\0"...,
> 8192, 0, NULL, NULL) = 807 <0.000066>
>
> 119971 14:33:48.076243 semop(26706044, {{8, -1, 0}}, 1 <unfinished ...>
>
> 120019 14:33:48.119971 recvfrom(9, <unfinished ...>
>
> 119971 14:33:53.491029 <... semop resumed> ) = 0 <5.414772>
>
> 119971 14:33:53.500356 lseek(18, 0, SEEK_END <unfinished ...>
>
> 119971 14:33:53.500436 <... lseek resumed> ) = 107790336 <0.000072>
>
> 119971 14:33:53.500514 lseek(20, 0, SEEK_END <unfinished ...>
>
> Checking strace long semop calls for whole day:
>
> root(at)host [20180314 17:47:36]:/home/user$ egrep " <[1-9]." /tmp/strace
> | grep semop
>
> 119991 12:33:36 <... semop resumed> ) = 0 <1.419394>
>
> 119942 12:33:36 <... semop resumed> ) = 0 <1.422554>
>
> 119930 12:33:36 <... semop resumed> ) = 0 <1.414916>
>
> 119988 12:33:36 <... semop resumed> ) = 0 <1.213309>
>
> 119966 12:33:36 <... semop resumed> ) = 0 <1.237492>
>
> 119958 14:33:53.489398 <... semop resumed> ) = 0 <5.455830>
>
> 120019 14:33:53.490613 <... semop resumed> ) = 0 <5.284505>
>
> 119997 14:33:53.490638 <... semop resumed> ) = 0 <5.111661>
>
> 120000 14:33:53.490649 <... semop resumed> ) = 0 <3.521992>
>
> 119991 14:33:53.490660 <... semop resumed> ) = 0 <2.522460>
>
> 119988 14:33:53.490670 <... semop resumed> ) = 0 <5.252485>
>
> 120044 14:33:53.490834 <... semop resumed> ) = 0 <1.718129>
>
> 119976 14:33:53.490852 <... semop resumed> ) = 0 <2.489563>
>
> 119974 14:33:53.490862 <... semop resumed> ) = 0 <1.520801>
>
> 119984 14:33:53.491011 <... semop resumed> ) = 0 <1.213411>
>
> 119971 14:33:53.491029 <... semop resumed> ) = 0 <5.414772>
>
> 119969 14:33:53.491039 <... semop resumed> ) = 0 <2.275608>
>
> 119966 14:33:53.491048 <... semop resumed> ) = 0 <2.526024>
>
> 119942 14:33:53.491058 <... semop resumed> ) = 0 <5.448506>
>
> 119964 15:23:38.746394 <... semop resumed> ) = 0 <2.034851>
>
> 119960 15:23:38.746426 <... semop resumed> ) = 0 <2.038321>
>
> 119966 15:23:38.752646 <... semop resumed> ) = 0 <1.252342>
>
> Also it was spotted that WALWriter Postgres backend also spend time in
> <semop resumed> during hangs.
>
> Also I have application on db host that performs pg_stat_activity
> shapshots every 500m and for example I can see that there were no
> snapshot between 14:33:47 and 14:33:53.
>
> Separate simple script on db host every ~100ms checks ps output for
> this application and writes it into the txt file. And we can see that
> while it usually performs about 7-8 times per second, between 14:33:47
> and 14:33:53 it couldn't even perform enough ps calls. Strace for this
> backend showed that this process was hung in semop call. So it tells
> me that whole system gets hung.
>
> 14:33:40 TOTAL=7 wait_transaction_locked=0 sleep_on_page=0
> get_request=0 sleep_on_buffer=0
>
> 14:33:41 TOTAL=7 wait_transaction_locked=0 sleep_on_page=0
> get_request=0 sleep_on_buffer=0
>
> 14:33:42 TOTAL=7 wait_transaction_locked=0 sleep_on_page=0
> get_request=0 sleep_on_buffer=0
>
> 14:33:43 TOTAL=7 wait_transaction_locked=0 sleep_on_page=0
> get_request=0 sleep_on_buffer=0
>
> 14:33:44 TOTAL=7 wait_transaction_locked=0 sleep_on_page=0
> get_request=0 sleep_on_buffer=0
>
> 14:33:45 TOTAL=7 wait_transaction_locked=0 sleep_on_page=0
> get_request=0 sleep_on_buffer=0
>
> 14:33:46 TOTAL=6 wait_transaction_locked=0 sleep_on_page=0
> get_request=0 sleep_on_buffer=0
>
> 14:33:47 TOTAL=2 wait_transaction_locked=0 sleep_on_page=0
> get_request=0 sleep_on_buffer=0
>
> 14:33:48 TOTAL=3 wait_transaction_locked=0 sleep_on_page=0
> get_request=0 sleep_on_buffer=0
>
> 14:33:49 TOTAL=2 wait_transaction_locked=0 sleep_on_page=0
> get_request=0 sleep_on_buffer=0
>
> 14:33:50 TOTAL=2 wait_transaction_locked=0 sleep_on_page=0
> get_request=0 sleep_on_buffer=0
>
> 14:33:51 TOTAL=2 wait_transaction_locked=0 sleep_on_page=0
> get_request=0 sleep_on_buffer=0
>
> 14:33:52 TOTAL=4 wait_transaction_locked=0 sleep_on_page=0
> get_request=0 sleep_on_buffer=0
>
> 14:33:53 TOTAL=6 wait_transaction_locked=0 sleep_on_page=0
> get_request=0 sleep_on_buffer=0
>
> 14:33:54 TOTAL=7 wait_transaction_locked=0 sleep_on_page=0
> get_request=0 sleep_on_buffer=0
>
> 14:33:55 TOTAL=7 wait_transaction_locked=0 sleep_on_page=0
> get_request=0 sleep_on_buffer=0
>
> I understand that RAID-6 is not the best option, but I can't catch any
> evidence telling that system run out of 1GB RAID controller cache on
> writes.
>
> Please assist in understanding meaning and nature of long semop calls
> appearances.
>
> --
>
> Regards,
>
> Pavel Suderevsky
>
> *From: *Pavel Suderevsky <mailto:psuderevsky(at)gmail(dot)com>
> *Sent: *Monday, January 22, 2018 21:43
> *To: *pgsql-performance(at)postgresql(dot)org
> <mailto:pgsql-performance(at)postgresql(dot)org>
> *Subject: *PG 9.6 Slow inserts with long-lasting LWLocks
>
> Hi,
>
> I have an issue with sporadic slow insert operations with query
> duration more than 1 sec while it takes about 50ms in average.
>
> Configuration:
>
> OS: Centos 7.2.151
>
> PostgreSQL: 9.6.3
>
> CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
>
> Memory: total used free shared
> buff/cache available
>
> Mem: 193166 10324 1856
> 44522 180985 137444
>
> Swap: 0 0 0
>
> Storage: Well, about 4gb of BBU write cache.
>
> shared_buffers = 32gb
>
> work_mem = 128mb
>
> max_pred_locks_per_transaction = 8192
>
> This can occur once a day or not happen for few days while system load
> is the same. "Inserts" are the prepared statement batches with 4-5
> inserts.
>
> Neither excessive memory usage nor disk or cpu utilizations have been
> catched.
>
> Wal writing rates, checkpoints, anything else from pg_stat_* tables
> were checked and nothing embarrassing was found.
>
> There are several scenarious of such long inserts were spotted:
>
> 1. No any locks catched (500ms check intervals)
>
> 2. Wait event is "buffer_mapping" - looks like the most common case
>
> snaphot time |
> state | trx duration | query duration | wait_event_type |
> wait_event | query
>
> 2017-12-22 03:16:01.181014 | active | 00:00:00.535309 |
> 00:00:00.524729 | LWLockTranche | buffer_mapping | INSERT INTO table..
>
> 2017-12-22 03:16:00.65814 | active | 00:00:00.012435 |
> 00:00:00.001855 | LWLockTranche | buffer_mapping | INSERT INTO table..
>
> 3. Wait event is "SerializablePredicateLockListLock" (I believe the
> same root cause as previous case)
>
> 4. No any locks catched, but ~39 other backends in parallel are active
>
> I assumed that it can be somehow related to enabled NUMA, but it looks
> like memory is allocated evenly, zone_reclaim_mode is 0.
>
> numactl --hardware
>
> available: 2 nodes (0-1)
>
> node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
> 42 44 46
>
> node 0 size: 130978 MB
>
> node 0 free: 1251 MB
>
> node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41
> 43 45 47
>
> node 1 size: 65536 MB
>
> node 1 free: 42 MB
>
> node distances:
>
> node 0 1
>
> 0: 10 21
>
> 1: 21 10
>
> numastat -m
>
> Per-node system memory usage (in MBs):
>
> Node 0 Node 1 Total
>
> --------------- --------------- ---------------
>
> MemTotal 130978.34 65536.00 196514.34
>
> MemFree 1479.07 212.12 1691.20
>
> MemUsed 129499.27 65323.88 194823.14
>
> Active 72241.16 37254.56 109495.73
>
> Inactive 47936.24 24205.40 72141.64
>
> Active(anon) 21162.41 18978.96 40141.37
>
> Inactive(anon) 1061.94 7522.34 8584.27
>
> Active(file) 51078.76 18275.60 69354.36
>
> Inactive(file) 46874.30 16683.06 63557.36
>
> Unevictable 0.00 0.00 0.00
>
> Mlocked 0.00 0.00 0.00
>
> Dirty 0.04 0.02 0.05
>
> Writeback 0.00 0.00 0.00
>
> FilePages 116511.36 60923.16 177434.52
>
> Mapped 16507.29 23912.82 40420.11
>
> AnonPages 3661.55 530.26 4191.81
>
> Shmem 18558.28 25964.74 44523.02
>
> KernelStack 16.98 5.77 22.75
>
> PageTables 3943.56 1022.25 4965.81
>
> NFS_Unstable 0.00 0.00 0.00
>
> Bounce 0.00 0.00 0.00
>
> WritebackTmp 0.00 0.00 0.00
>
> Slab 2256.09 1291.53 3547.61
>
> SReclaimable 2108.29 889.85 2998.14
>
> SUnreclaim 147.80 401.68 549.47
>
> AnonHugePages 1824.00 284.00 2108.00
>
> HugePages_Total 0.00 0.00 0.00
>
> HugePages_Free 0.00 0.00 0.00
>
> HugePages_Surp 0.00 0.00 0.00
>
> $ cat /proc/62679/numa_maps | grep N0 | grep zero
>
> 7f92509d3000 prefer:0 file=/dev/zero\040(deleted) dirty=8419116
> mapmax=154 active=8193350 N0=3890534 N1=4528582 kernelpagesize_kB=4
>
> Could you advise what can cause such occasional long inserts with
> long-lasting LWlocks?
>

In response to

RE: PG 9.6 Slow inserts with long-lasting LWLocks at 2018-03-15 10:29:33 from Pavel Suderevsky

Responses

RE: PG 9.6 Slow inserts with long-lasting LWLocks at 2018-03-19 17:26:32 from Pavel Suderevsky

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Craig James	2018-03-16 20:37:05	Irrelevant columns cause massive performance change
Previous Message	Pavel Suderevsky	2018-03-15 10:29:33	RE: PG 9.6 Slow inserts with long-lasting LWLocks