From: | Rui An <rueian(dot)huang(at)gmail(dot)com> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Many queries stuck for waiting MultiXactOffsetControlLock on PostgreSQL 9.6.1 replica |
Date: | 2021-07-31 19:30:14 |
Message-ID: | CABjTVMR39ci6LUVE2yWhpcufPee4nO3cu32sij3Yh8viksCztQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi, I have recently encountered the problem, as title, on some of my
PostgreSQL 9.6.1 hot standby replicas when serving readonly queries.
The problem makes a lot of PostgreSQL processes stuck on waiting
for MultiXactOffsetControlLock and eventually runs out of
the max_connections quota.
For example:
> mydb=# select count(*) from pg_stat_activity where state = 'active' and
> wait_event = 'MultiXactOffsetControlLock';
> count
> ------
> 956
> (1 row)
I have tried use pg_terminate_backend to kill the stuck prcoesses, but no
help. They still stuck on waiting for MultiXactOffsetControlLock even if
the pg_terminate_backend(pid) returns true.
Currently, the only thing I can do to resolve the problem is to restart my
replicas.
Can someone help me find out how could the problem occurred and is there
any other way to resolve the problem?
I have also recorded some strace and perf top output on stuck processes:
strace:
> strace: Process 555 attached
> select(0, NULL, NULL, NULL, {0, 78} = 0 (Timeout)
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> semop(60981276, [{0, -1, 0}, 1]) = 0
> semop(60981276, [{0, -1, 0}, 1]) = 0
> semop(61734963, [{14, -1, 0}, 1]) = 0
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> semop(61505580, [{12, -1, 0}, 1]) = 0
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> semop(60981276, [{0, -1, 0}, 1]) = 0
> semop(63471720, [{9, -1, 0}, 1]) = 0
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> semop(60981276, [{0, -1, 0}, 1]) = 0
> semop(63471720, [{9, -1, 0}, 1]) = 0
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
> semop(60981276, [{0, -1, 0}, 1]) = 0
perf top:
> Samples: 208 of event 'cpu-clock', Event count (approx.): 36045672
> Overhead Shared Object Symbol
> 30.74% postgres [.] perform_spin_delay
> 13.70% [kernel] [k] finish_task_switch
> 13.66% [kernel] [k] 0x00007fff81838f35
> 8.59% postgres [.] 0x0000000000319107
> 4.02% postgres [.] 0x0000000000318f04
> 2.47% postgres [.] pg_usleep
> 2.10% libc-2.23.so [.] __select
> 1.68% postgres [.] PGSemaphoreLock
> 1.68% postgres [.] 0x0000000000318eb2
> 1.54% [kernel] [k] pvclock_clocksource_read
> 1.53% postgres [.] GetMultiXactIdMembers
> 1.41% postgres [.] 0x0000000000318ea4
> 1.31% postgres [.] LWLockAcquire
> 1.30% postgres [.] 0x0000000000318e17
> 1.23% [kernel] [k] do_select
> 1.07% postgres [.] finish_spin_delay
> 1.06% [kernel] [k] poll_freewait
> 1.05% libc-2.23.so [.] semop
> 0.96% [kernel] [k] poll_schedule_timeout
> 0.72% [kernel] [k] SYSC_semtimedop
> 0.69% [kernel] [k] core_sys_select
> 0.69% [kernel] [k] pid_vnr
> 0.61% [kernel] [k] dev_gro_receive
> 0.61% libc-2.23.so [.] random
> 0.61% postgres [.] 0x0000000000318e3d
> 0.53% [kernel] [k] hrtimer_active
> 0.53% [kernel] [k] ipc_obtain_object_check
> 0.46% postgres [.] 0x0000000000318f00
> 0.46% postgres [.] 0x0000000000319100
> 0.46% postgres [.] 0x0000000000319103
> 0.41% postgres [.] LWLockRelease
> 0.41% postgres [.] 0x0000000000318f85
> 0.36% [kernel] [k] run_timer_softirq
> 0.36% [kernel] [k] sys_semop
From | Date | Subject | |
---|---|---|---|
Next Message | Andrey Borodin | 2021-08-02 04:57:03 | Re: Many queries stuck for waiting MultiXactOffsetControlLock on PostgreSQL 9.6.1 replica |
Previous Message | Peter Geoghegan | 2021-07-31 08:51:44 | Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data |