From: | Scott Carey <scott(at)richrelevance(dot)com> |
---|---|
To: | Mike Ivanov <mikei(at)activestate(dot)com> |
Cc: | Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Sean Ma <seanxma(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: random slow query |
Date: | 2009-07-01 00:11:19 |
Message-ID: | C66FF7B7.8EFF%scott@richrelevance.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On 6/30/09 2:39 PM, "Mike Ivanov" <mikei(at)activestate(dot)com> wrote:
> Scott Carey wrote:
>>> 222 / 8 cores = ridiculous 27 processes per core, while the OP has 239
>> That's not rediculous at all. Modern OS's handle thousands of idle
>> processes just fine.
>>
>>
> I meant that 27 was a ridiculously small number.
>
>> Or you can control the behavior with the following kenrnel params:
>> vm.swappiness
>> vm.dirty_ratio
>> vm.dirty_background ratio
>>
> Thanks for pointing that out!
>
>> Actually, no. When a process wakes up only the pages that are needed are
>> accessed. For most idle processes that wake up from time to time, a small
>> bit of work is done, then they go back to sleep. This initial allocation
>> does NOT come from the page cache, but from the "buffers" line in top. The
>> os tries to keep some ammount of free buffers not allocated to processes or
>> pages available, so that allocation demands can be met without having to
>> synchronously decide which buffers from page cache to eject.
>>
> Wait a second, I'm trying to understand that :-)
> Did you mean that FS cache pages are first allocated from the buffer
> pages or that process memory being paged out to swap is first written to
> buffers? Could you clarify please?
>
There are some kernel parameters that control how much RAM the OS tries to
keep in a state that is not allocated to page cache or processes. I've
forgotten what these are exactly.
But the purpose is to prevent the virtual memory system from having to make
the decision on what memory to kick out of the page cache, or what pages to
swap to disk, when memory is allocated. Rather, it can do this in the
background most of the time. So, the first use of this is when a process
allocates memory. Pulling a swapped page off disk probably uses this too
but I'm not sure. It would make sense. Pages being written to swap go
directly to swap and deallocated.
File pages are either on disk or in the page cache. Process pages are
either in memory or swap.
But when either of these is first put in memory (process allocation,
page-in, file read), the OS can either quickly allocate to the process or
the page cache from the free buffers, or more slowly take from the page
cache, or even more slowly page out a process page.
>> If queries are intermittently causing problems, it might be due to
>> checkpoints. Make sure that the kernel parameters for
>> dirty_background_ratio is 5 or less, and dirty_ratio is 10 or less.
>>
> Scott, isn't dirty_ratio supposed to be less than
> dirty_background_ratio? I've heard that system would automatically set
> dirty_ratio = dirty_background_ratio / 2 if that's not the case. Also,
> how dirty_ratio could be less than 5 if 5 is the minimal value?
>
dirty_ratio is the percentage of RAM that can be in the page cache and not
yet written to disk before all writes in the system block.
dirty_background_ratio is the percentage of RAM that can be filled with
dirty file pages before a background thread is started by the OS to start
flushing to disk. Flushing to disk also occurs on timed intervals or other
triggers.
By default, Linux 2.6.18 (RHEL5/Centos5, etc) has the former at 40 and the
latter at 10, which on a 32GB system means over 13GB can be in memory and
not yet on disk! Sometime near 2.6.22 or so the default became 10 and 5,
respectively. For some systems, this is still too much.
I like to use the '5 second rule'. dirty_background_ratio should be sized
so that it takes about 5 seconds to flush to disk in optimal conditions.
dirty_ratio should be 2x to 5x this depending on your application's needs --
for a system with well tuned postgres checkpoints, smaller tends to be
better to limit stalls while waiting for the checkpoint fsync to finish.
> Regards,
> Mike
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Mike Ivanov | 2009-07-01 00:58:46 | Re: random slow query |
Previous Message | Mike Ivanov | 2009-06-30 21:39:50 | Re: random slow query |