From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Vlad <marchenko(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: High SYS CPU - need advise |
Date: | 2012-11-16 15:28:32 |
Message-ID: | CAHyXU0wJzGCdg9gKdRdnWvXaATia42BZa=DCLYhQ=u-qLtG++w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Thu, Nov 15, 2012 at 6:07 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Thu, Nov 15, 2012 at 2:44 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>
>>>> select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
>>>> select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
>>>> select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
>>>> select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout)
>>>> select(0, NULL, NULL, NULL, {0, 3000}) = 0 (Timeout)
>>>> select(0, NULL, NULL, NULL, {0, 4000}) = 0 (Timeout)
>>>> select(0, NULL, NULL, NULL, {0, 6000}) = 0 (Timeout)
>>>> select(0, NULL, NULL, NULL, {0, 7000}) = 0 (Timeout)
>>>> select(0, NULL, NULL, NULL, {0, 8000}) = 0 (Timeout)
>>>> select(0, NULL, NULL, NULL, {0, 9000}) = 0 (Timeout)
>
> This is not entirely inconsistent with the spinlock. Note that 1000
> is repeated 3 times, and 5000 is missing.
>
> This might just be a misleading random sample we got here. I've seen
> similar close spacing in some simulations I've run.
>
> It is not clear to me why we use a resolution of 1 msec here. If the
> OS's implementation of select() eventually rounds to the nearest msec,
> that is its business. But why do we have to lose intermediate
> precision due to its decision?
Yeah -- you're right, this is definitely spinlock issue. Next steps:
*) in mostly read workloads, we have a couple of known frequent
offenders. In particular the 'BufFreelistLock'. One way we can
influence that guy is to try and significantly lower/raise shared
buffers. So this is one thing to try.
*) failing that, LWLOCK_STATS macro can be compiled in to give us some
information about the particular lock(s) we're binding on. Hopefully
it's a lwlock -- this will make diagnosing the problem easier.
*) if we're not blocking on lwlock, it's possibly a buffer pin related
issue? I've seen this before, for example on an index scan that is
dependent on an seq scan. This long thread:
"http://postgresql.1045698.n5.nabble.com/9-2beta1-parallel-queries-ReleasePredicateLocks-CheckForSerializableConflictIn-in-the-oprofile-td5709812i100.html"
has a lot information about that case and deserves a review.
*) we can consider experimenting with futex
(http://archives.postgresql.org/pgsql-hackers/2012-06/msg01588.php)
to see if things improve. This is dangerous, and could crash your
server/eat your data, so fair warning.
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2012-11-16 15:31:12 | Re: Set returning functions in the SELECT list |
Previous Message | Ryan Kelly | 2012-11-16 14:14:33 | Set returning functions in the SELECT list |