postgresql hanging/stuck

From: Andrzej Pilacik <cypisek77(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: postgresql hanging/stuck
Date: 2017-03-14 20:47:40
Message-ID: CAJw8uJQGoobesbPCMbxj6Vb4nv9D-GgvZ+7pK+fckbb4DqJEAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The below is the event timeline describes the issue we are having:

1. We are running 9.3.16 on Redhat 7.1 3.10.0-229.14.1.el7.x86_64

2. 500 max connection limit

3. 2 CPU X 18 cores - hyperthreded (72 threads) - 512GB RAM

4. 80% OLTP, 20% OLAP

Until late Feb we didn't encounter any issues on this server other than the
normal performance troubleshooting.

Since then we had 2 incidents where PostgreSQL connections went to about
300 and PostgreSQL became unresponsive and restarted itself.

We have monitors setup to page us when connections are over 250.
PostgreSQL should still have about 250 connections available.

When this occurred, we saw a lot of connections in the authentication state
(stuck)

*They all looked like: postgres: username databasename ipaddress (pid)
authentication*

We tried killing some of the idle connection but these attempts were
unsuccessful.

No new connections were able to be established at this time until the
authentication connections hit some high number (over 500) and we were
forced to restart the engine by restarting (killing) the postgresql PID.

Both incidents came 2 weeks apart and other than our normal processing we
didn't find any correlations on why this is happening.

During the incident (about 10-15min timeline)

- CPUs are running high but the box is still very responsible. 60-70%

- Memory allocations are ok, no paging. Nothing looks out of norm.

- We have our normal scenario engine running with about 30-40 active
connections + some 5-10 active reporting and processing connections at a
time (lots of quick/short queries)

- When the ~300 connections hit and postgresql becomes unresponsive, it
also stop writing to its log

- All new connections get stuck in the "authentication" state and queue up

- Kernel logs are clean, no system panic, nothing out of norm

We don't understand why so many connections would be stuck in the
authentication state.

This server has not been modified other than the quarterly PostgreSQL
update. Also, the server was rebooted just 2 days before the 2nd incident
because of storage maintenance.

PostgreSQL Configuration CUSTOM settings:

shared_buffers = 8GB

work_mem = 10MB

maint_work_mem = 1GB

wal_level = hot standby

checkpoint_segments = 32

checkpoint_timeout = 10min

checkpoint_completion_target = 0.9

checkpoint_warning = 30s

archive_mode = on

max_wal_senders = 2

effective_cache_size = 393216MB

Please let me know if you there is anything I can do to elevate this issue.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2017-03-14 20:55:24 Re: postgresql hanging/stuck
Previous Message Wiler Coelho Jr. 2017-03-14 17:32:56 Error floating-point exception on postgresql installer