Re: Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

From: Nikolai Zhubr <n-a-zhubr(at)yandex(dot)ru>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PG-General Mailing List <pgsql-general(at)postgresql(dot)org>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).
Date: 2017-02-03 10:52:26
Message-ID: 589460EA.2010404@yandex.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

02.02.2017 2:14, I wrote:
> 01.02.2017 1:02, I wrote:
> [...]
>>> Could you use process monitor or such to see what the process is doing
>>> while using a lot of CPU?
>>
>> I'm not sure how to do this, especially considering that the process in
>> question is running as a service?
>>
>> Now, some more input:
>>
>> * 9.5.2 server running on linux x86_64 - unaffected! (What a relief! We
>> are moving to Centos soon anyway!)
>>
>> * 9.4.4 server running on win7 32-bit - affected, same thing as on XP.
>
> I've managed to create a "fix" (see diff below).
> It looks like the wait logic is somehow broken on windows currently,
> though I can not find the problem myself yet.
> It would be great if someone more familiar with the (windows-specific)
> code came up with ideas.
> I have a build environment ready so I could do more tests then.

Some update.

Adding this "Sleep(15)" before "goto retry" into secure_read() has
apparently eliminated the effect at our production server too. That is,
my load-bug-detector has been quiet for > 24hr or more.

Now by adding more debigging stuff into secure_read() and secure_write()
I've found that:

* secure_write() is likely irrelevant, as "goto retry" there was never
actually hit yet;

* in secure_read(), during the intervals of excessive cpu load,
WaitLatchOrSocket() was never observed to indicate latch event, and was
never observed to (erroneously) indicate socket readiness more than once
(with socket read attempt in between), which I was suspecting happening,
so I can not blame secure_read() itself and this all makes me wonder
even more...

Note: I'm testing with SSL off now.

As always, and hints greatly appreciated!

Thank you.
Nikolai

>
> --- be-secure.c.orig 2017-02-01 22:37:37.228032608 +0300
> +++ be-secure.c 2017-02-01 22:51:17.655751292 +0300
> @@ -159,6 +159,7 @@
> * socket to become ready again.
> */
> }
> + Sleep(15); /* n.zhubr */
> goto retry;
> }
>
> @@ -238,6 +239,7 @@
> * socket to become ready again.
> */
> }
> + Sleep(15); /* n.zhubr */
> goto retry;
> }
>
>
> Thank you.
>
> Nikolai
>
>>
>>
>> Thank you.
>>
>> Nikolai
>>
>>>
>>> Regards,
>>>
>>> Andres
>>>
>>
>>
>>
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jong-won Choi 2017-02-03 12:12:12 Re: Row level security policy - calling function for right hand side value of 'in' in using_expression
Previous Message JP Jacoupy 2017-02-03 09:18:40 Re: Synchronous Commit, WAL archiving and statement_timeout