RE : Stalls on PGSemaphoreLock

From: Pavy Philippe <Philippe(dot)Pavy(at)worldline(dot)com>
To: Ray Stell <stellr(at)vt(dot)edu>, Matthew Spilich <mspilich(at)tripadvisor(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: RE : Stalls on PGSemaphoreLock
Date: 2014-03-25 17:45:17
Message-ID: 5F8F324242D0E14B97060D4D32CD0F5C8279F83512@FRSPX100.fr01.awl.atosorigin.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello

Recently I have a similar problem. The first symptom was a freeze of the connection and 100% of CPU SYS during 2 et 10 minutes, 1 or 2 times per day.
Connection impossible, slow query. The strace on one backend show a very long system call on semop().
We have a node with 48 cores dans 128 Go of memory.

We have disable the hugepage and upgrade the semaphore configuration, and since that time, we no longer have any problem of freeze on our instance.

Can you check the hugepage and semaphore configuration on our node ?

I am interested in this case, so do not hesitate to let me make a comeback. Thanks.

excuse me for my bad english !!!

________________________________________
De : pgsql-performance-owner(at)postgresql(dot)org [pgsql-performance-owner(at)postgresql(dot)org] de la part de Ray Stell [stellr(at)vt(dot)edu]
Date d'envoi : mardi 25 mars 2014 18:17
À : Matthew Spilich
Cc : pgsql-performance(at)postgresql(dot)org
Objet : Re: [PERFORM] Stalls on PGSemaphoreLock

On Mar 25, 2014, at 8:46 AM, Matthew Spilich wrote:

The symptom: The database machine (running postgres 9.1.9 on CentOS 6.4) is running a low utilization most of the time, but once every day or two, it will appear to slow down to the point where queries back up and clients are unable to connect. Once this event occurs, there are lots of concurrent queries, I see slow queries appear in the logs, but there doesn't appear to be anything abnormal that I have been able to see that causes this behavior.
...
Has any on the forum seen something similar? Any suggestions on what to look at next? If it is helpful to describe the server hardware, it's got 2 E5-2670 cpu and 256 GB of ram, and the database is hosted on 1.6TB raid 10 local storage (15K 300 GB drives).

I could be way off here, but years ago I experienced something like this (in oracle land) and after some stressful chasing, the marginal failure of the raid controller revealed itself. Same kind of event, steady traffic and then some i/o would not complete and normal ops would stack up. Anyway, what you report reminded me of that event. The E5 is a few years old, I wonder if the raid controller firmware needs a patch? I suppose a marginal power supply might cause a similar "hang." Anyway, marginal failures are very painful. Have you checked sar or OS logging at event time?

Ce message et les pièces jointes sont confidentiels et réservés à l'usage exclusif de ses destinataires. Il peut également être protégé par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra être recherchée quant au contenu de ce message. Bien que les meilleurs efforts soient faits pour maintenir cette transmission exempte de tout virus, l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Worldline liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew Spilich 2014-03-25 18:38:27 Re: Stalls on PGSemaphoreLock
Previous Message Ray Stell 2014-03-25 17:17:11 Re: Stalls on PGSemaphoreLock