From: | Ray Stell <stellr(at)vt(dot)edu> |
---|---|
To: | Matthew Spilich <mspilich(at)tripadvisor(dot)com> |
Cc: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Stalls on PGSemaphoreLock |
Date: | 2014-03-25 17:17:11 |
Message-ID: | 295608A9-10CA-4F3C-B45D-A98EA8E8F8CF@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Mar 25, 2014, at 8:46 AM, Matthew Spilich wrote:
> The symptom: The database machine (running postgres 9.1.9 on CentOS 6.4) is running a low utilization most of the time, but once every day or two, it will appear to slow down to the point where queries back up and clients are unable to connect. Once this event occurs, there are lots of concurrent queries, I see slow queries appear in the logs, but there doesn't appear to be anything abnormal that I have been able to see that causes this behavior.
...
> Has any on the forum seen something similar? Any suggestions on what to look at next? If it is helpful to describe the server hardware, it's got 2 E5-2670 cpu and 256 GB of ram, and the database is hosted on 1.6TB raid 10 local storage (15K 300 GB drives).
I could be way off here, but years ago I experienced something like this (in oracle land) and after some stressful chasing, the marginal failure of the raid controller revealed itself. Same kind of event, steady traffic and then some i/o would not complete and normal ops would stack up. Anyway, what you report reminded me of that event. The E5 is a few years old, I wonder if the raid controller firmware needs a patch? I suppose a marginal power supply might cause a similar "hang." Anyway, marginal failures are very painful. Have you checked sar or OS logging at event time?
From | Date | Subject | |
---|---|---|---|
Next Message | Pavy Philippe | 2014-03-25 17:45:17 | RE : Stalls on PGSemaphoreLock |
Previous Message | Claudio Freire | 2014-03-25 16:46:03 | Re: pg_dump vs pg_basebackup |