From: | Lars Aksel Opsahl <Lars(dot)Opsahl(at)nibio(dot)no> |
---|---|
To: | "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org> |
Subject: | SubtransControlLock and performance problems |
Date: | 2020-02-16 17:15:25 |
Message-ID: | HE1P189MB0266B070D1452B04A7C9F6D99D170@HE1P189MB0266.EURP189.PROD.OUTLOOK.COM |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi
On a server with 32 cores and 250 GB memory, with CentOS 7 and kernel 4.4.214-1.el7.elrepo.x86_64, I try to run 30 parallel threads using dblink. (https://github.com/larsop/postgres_execute_parallel) . I have tried to disconnect and reconnect in the dblink code and that did not help.
If I reduce the number of threads I get less CPU usage and much less SubtransControlLock.
Each thread are inserting many lines into a Postgis Topology layer. I have a lot of try catch in this code to avoid missing lines (https://github.com/larsop/resolve-overlap-and-gap/tree/add_postgis_topology_using_proc) .
What happens is that after some minutes the CPU can fall to maybe 20% usage and most of the threads are blocked by SubtransControlLock, and when the number SubtransControlLock goes down the CPU load increases again. The jobs usually goes through without any errors, but it takes to long time because of the SubtransControlLock blocks.
There is no iowait on the server and there is plenty of free memory on the server. There seems to be no locks on the common tables.
“SELECT relation::regclass, * FROM pg_locks WHERE NOT GRANTED;” is always empty.
I am using a lot temp tables and unlogged tables.
To reduce the number locks I have a simple check before I kick off new jobs like the one below, but that did not help very much either. Yes it does a lot waiting, but SubtransControlLock kick in when all threads are up running again.
LOOP
EXECUTE Format('SELECT count(*) from pg_stat_activity where wait_event = %L and query like %L',
'SubtransControlLock',
'CALL resolve_overlap_gap_save_single_cells%') into subtransControlLock;
EXIT WHEN subtransControlLock = 0;
subtransControlLock_count := subtransControlLock_count + 1;
PERFORM pg_sleep(subtransControlLock*subtransControlLock_count*0.1);
END LOOP;
I have tested with postgres 11, postgres 12, postgis 2.5 , postgis 3.0 and it seems to behave save.
I have also tried to recompile postgres with the setting below and that did not solve the problem either.
/* Number of SLRU buffers to use for subtrans */
#define NUM_SUBTRANS_BUFFERS 2048
I have tested different values for memory and other settings nothing seems to matter. Here are the settings right now.
maintenance_work_mem = 8GB
max_connections = 600
work_mem = 500MB
temp_buffers = 100MB
shared_buffers = 64GB
effective_cache_size = 124GB
wal_buffers = 640MB
seq_page_cost = 2.0
random_page_cost = 2.0
checkpoint_flush_after = 2MB
checkpoint_completion_target = 0.9
default_statistics_target = 1000
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = all
effective_io_concurrency = 500 # 1-1000; 0 disables prefetching
# test to avoid SubtransControlLock
#bgwriter_lru_maxpages = 100000
#bgwriter_lru_maxpages=0
#bgwriter_delay = 20ms
synchronous_commit = off
Any idea about how to solve this ?
Lars
From | Date | Subject | |
---|---|---|---|
Next Message | Mike Sofen | 2020-02-17 00:43:10 | RE: Partial index creation always scans the entire table |
Previous Message | Tom Lane | 2020-02-16 16:35:43 | Re: Partial index creation always scans the entire table |