Re: High SYS CPU - need advise

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Vlad <marchenko(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: High SYS CPU - need advise
Date: 2012-11-16 15:28:32
Message-ID: CAHyXU0wJzGCdg9gKdRdnWvXaATia42BZa=DCLYhQ=u-qLtG++w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Nov 15, 2012 at 6:07 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Thu, Nov 15, 2012 at 2:44 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>
>>>> select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
>>>> select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
>>>> select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
>>>> select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout)
>>>> select(0, NULL, NULL, NULL, {0, 3000}) = 0 (Timeout)
>>>> select(0, NULL, NULL, NULL, {0, 4000}) = 0 (Timeout)
>>>> select(0, NULL, NULL, NULL, {0, 6000}) = 0 (Timeout)
>>>> select(0, NULL, NULL, NULL, {0, 7000}) = 0 (Timeout)
>>>> select(0, NULL, NULL, NULL, {0, 8000}) = 0 (Timeout)
>>>> select(0, NULL, NULL, NULL, {0, 9000}) = 0 (Timeout)
>
> This is not entirely inconsistent with the spinlock. Note that 1000
> is repeated 3 times, and 5000 is missing.
>
> This might just be a misleading random sample we got here. I've seen
> similar close spacing in some simulations I've run.
>
> It is not clear to me why we use a resolution of 1 msec here. If the
> OS's implementation of select() eventually rounds to the nearest msec,
> that is its business. But why do we have to lose intermediate
> precision due to its decision?

Yeah -- you're right, this is definitely spinlock issue. Next steps:

*) in mostly read workloads, we have a couple of known frequent
offenders. In particular the 'BufFreelistLock'. One way we can
influence that guy is to try and significantly lower/raise shared
buffers. So this is one thing to try.

*) failing that, LWLOCK_STATS macro can be compiled in to give us some
information about the particular lock(s) we're binding on. Hopefully
it's a lwlock -- this will make diagnosing the problem easier.

*) if we're not blocking on lwlock, it's possibly a buffer pin related
issue? I've seen this before, for example on an index scan that is
dependent on an seq scan. This long thread:
"http://postgresql.1045698.n5.nabble.com/9-2beta1-parallel-queries-ReleasePredicateLocks-CheckForSerializableConflictIn-in-the-oprofile-td5709812i100.html"
has a lot information about that case and deserves a review.

*) we can consider experimenting with futex
(http://archives.postgresql.org/pgsql-hackers/2012-06/msg01588.php)
to see if things improve. This is dangerous, and could crash your
server/eat your data, so fair warning.

merlin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2012-11-16 15:31:12 Re: Set returning functions in the SELECT list
Previous Message Ryan Kelly 2012-11-16 14:14:33 Set returning functions in the SELECT list