From: | Bruno Richard <bruno(dot)richard(at)broadsign(dot)com> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | PostgreSQL hot standby Hangs due to AccessExclusiveLock on pg_attribute or pg_type tables |
Date: | 2017-06-19 17:29:34 |
Message-ID: | CAB-EU3RawZx8-OzMfvswFf6z+Y7GOZf03TZ=bez+PbQX+A4M7Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
*Summary:*
PostgreSQL hot standby Hangs due to AccessExclusiveLock on pg_attribute or
pg_type tables
*Platform information (all instances)*
PostgreSQL 9.6.2, ltree extension enabled.
Ubuntu 12.04 64bits
*Description:*
We have an issue on our Database system since we migrated from PostgreSQL
9.3.2 to 9.6.2.
The issue is that one (or many) of our hot standby instances does an
AccessExclusiveLock on the pg_attribute or pg_type system tables that never
gets unlocked, causing hangs on our systems. We have to restart the
instance to get out of the lock. It happened ~30 times since end of April.
*Steps to reproduce:*
Looks like a race condition, happens randomly in production only. I am
trying to replicate in house.
*Description:*
This is the output from the pg_locks table when the issue occurs:
-[ RECORD 129 ]----+--------------------
locktype | relation
database | 16385
relation | 1249
page |
tuple |
virtualxid |
transactionid |
classid |
objid |
objsubid |
virtualtransaction | 1/0
pid | 19018
mode | AccessExclusiveLock
granted | t
fastpath | f
-[ RECORD 130 ]----+--------------------
locktype | relation
database | 16385
relation | 1249
page |
tuple |
virtualxid |
transactionid |
classid |
objid |
objsubid |
virtualtransaction | 170/47
pid | 10842
mode | AccessShareLock
granted | f
fastpath | f
The locking PID is the postmaster process.
When the pg_attribute(oid=1249) table is locked, we cannot login to the
database, it just hangs. It is possible to login to psql without specifying
the database.
When the pg_type(oid=1247) table is locked, we can login to the database,
but not able to perform queries, it hangs.
While the processes are hung, CPU usage is low and RAM is normal.
pg_logs will output many of these:
"Process holding the lock: 19018. Wait queue: 6022, 6023, 6038, 6456, 6466,
6467, 6718, 7077, 7079."
We noticed that Autovacuum is executed on those tables on the master just
before the issue happens:
GMT USER: PID:11218 LOG: automatic vacuum of table
"bssdb_admin.pg_catalog.pg_attribute": index scans: 1
pages: 22 removed, 116 remain, 0 skipped due to pins, 0 skipped
frozen
tuples: 424 removed, 6097 remain, 0 are dead but not yet removable
buffer usage: 720 hits, 0 misses, 92 dirtied
avg read rate: 0.000 MB/s, avg write rate: 7.055 MB/s
system usage: CPU 0.02s/0.08u sec elapsed 0.10 sec
Since we create some temp tables, autovacuum on pg_attribute is executed
~70 times per day, and the autovacuum on pg_type is executed ~20 times per
day.
The issue is randomly happening (normally every 2 days).
*Configuration Files:*
------
max_connections = 250
superuser_reserved_connections = 5
unix_socket_directories = '/tmp'
shared_buffers = 32GB
temp_buffers = 8MB
max_prepared_transactions = 5
work_mem = 16MB
maintenance_work_mem = 512MB
max_stack_depth = 2MB
vacuum_cost_delay = 0
vacuum_cost_page_hit = 1
vacuum_cost_page_miss = 10
vacuum_cost_page_dirty = 20
vacuum_cost_limit = 200
wal_level = replica
wal_sync_method = fdatasync
wal_buffers = 16MB
#checkpoint_timeout = 30min
max_wal_size = 30GB
checkpoint_completion_target = 0.90
max_wal_senders = 30
wal_keep_segments = 8000
seq_page_cost = 1.0
random_page_cost = 2.0
cpu_tuple_cost = 0.5
cpu_index_tuple_cost = 0.5
effective_cache_size = 120GB
default_statistics_target = 100
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
client_min_messages = notice
log_min_messages = notice
log_error_verbosity = default
log_min_error_statement = panic
log_min_duration_statement = 1000ms
log_hostname = off
log_line_prefix = '%t'
log_statement = 'none'
autovacuum = on
autovacuum_naptime = 1min
autovacuum_vacuum_threshold = 500
autovacuum_analyze_threshold = 250
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_scale_factor = 0.1
autovacuum_vacuum_cost_delay = 0
autovacuum_vacuum_cost_limit = 200
datestyle = 'iso'
timezone = UTC
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
log_statement = 'ddl'
log_line_prefix = '%t USER:%u PID:%p '
log_connections = on
log_disconnections = on
log_lock_waits = on
log_autovacuum_min_duration = 100ms
--------
*Configs on hot standby db:*
--------
max_connections = 250
superuser_reserved_connections = 5
unix_socket_directories = '/tmp'
shared_buffers = 16GB
temp_buffers = 8MB
max_prepared_transactions = 5
work_mem = 16MB
maintenance_work_mem = 512MB
max_stack_depth = 2MB
vacuum_cost_delay = 0
vacuum_cost_page_hit = 1
vacuum_cost_page_miss = 10
vacuum_cost_page_dirty = 20
vacuum_cost_limit = 200
wal_level = replica
wal_sync_method = fdatasync
wal_buffers = 16MB
hot_standby = on
#checkpoint_timeout = 30min
max_wal_size = 3GB
checkpoint_completion_target = 0.90
max_wal_senders = 30
#wal_keep_segments = 3000
#max_standby_archive_delay = 1200s
max_standby_streaming_delay = 1200s
seq_page_cost = 1.0
random_page_cost = 2.0
cpu_tuple_cost = 0.5
cpu_index_tuple_cost = 0.5
effective_cache_size = 55GB
default_statistics_target = 100
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
client_min_messages = notice
log_min_messages = notice
log_error_verbosity = default
log_min_error_statement = panic
log_min_duration_statement = 1000ms
log_hostname = off
log_line_prefix = '%t'
log_statement = 'none'
autovacuum = on
autovacuum_naptime = 1min
autovacuum_vacuum_threshold = 500
autovacuum_analyze_threshold = 250
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_scale_factor = 0.1
autovacuum_vacuum_cost_delay = 0
autovacuum_vacuum_cost_limit = 200
datestyle = 'iso'
timezone = UTC
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
log_line_prefix = '%t USER:%u PID:%p '
log_connections = on
log_disconnections = on
log_lock_waits = on
------
Thank you
--
Bruno Richard | Software Architect
BroadSign International, LLC. | http://broadsign.com/
T: +1.514.399.1184 | F: +1.514.399.1187
From | Date | Subject | |
---|---|---|---|
Next Message | J Chapman Flack | 2017-06-19 18:18:50 | Re: Postgresql bug report - unexpected behavior of suppress_redundant_updates_trigger |
Previous Message | Alvaro Herrera | 2017-06-19 17:19:06 | Re: [HACKERS] Postgresql bug report - unexpected behavior of suppress_redundant_updates_trigger |