From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Rambabu V <ram(dot)wissen(at)gmail(dot)com> |
Cc: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, "pgsql-performa(dot)" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: need help on memory allocation |
Date: | 2018-01-23 22:36:53 |
Message-ID: | CAMkU=1zG0EP36=rGWCwrCO6OLCGJdcCuUbv7yDj5AnqvpRZ=Ug@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Tue, Jan 23, 2018 at 5:59 AM, Rambabu V <ram(dot)wissen(at)gmail(dot)com> wrote:
> > cat PostgreSQL-2018-01-23_060000.csv|grep FATAL
What about ERROR, not just FATAL? Or grep for "out of memory"
>> *$ free -mh*
>> total used free shared buffers cached
>> Mem: 58G 58G 358M 16G 3.6M 41G
>> -/+ buffers/cache: 16G 42G
>> Swap: 9.5G 687M 8.9G
>>
>
This does not seem like it should be a problem. Is this data collected
near the time of the failure?
> work_mem = 256MB # min 64kB
>> max_connections = 600
>>
>
These look pretty high, especially in combination. Why do you need that
number of connections? Could you use a connection pooler instead? Or do
just have an application bug (leaked connection handles) that needs to be
fixed? Why do you need that amount of work_mem?
> *ps -ef|grep postgres|grep idle|wc -l*
>> 171
>>
>> *ps -ef|grep postgres|wc -l*
>> 206
>>
>
How close to the time of the problem was this recorded? How many of the
idle are 'idle in transaction'?
>> PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
>> 109063 postgres 20 0 16.7G 16.4G 16.3G S 0.0 27.8 39:55.61
>> postgres: test sss 10.20.2.228(55174) idle
>> 24910 postgres 20 0 16.7G 16.4G 16.3G S 0.0 27.8 27:45.35
>> postgres: testl sss 10.20.2.228(55236) idle
>> 115539 postgres 20 0 16.7G 16.4G 16.3G S 0.0 27.8 28:22.89
>> postgres: test sss 10.20.2.228(55184) idle
>> 9816 postgres 20 0 16.7G 16.4G 16.3G S 0.0 27.8 40:19.57
>> postgres: test sss 10.20.2.228(55216) idle
>>
>
How close to the time of the problem was this recorded? Nothing here seems
to be a problem, because almost all the memory they have resident is shared
memory.
It looks like all your clients decide to run a memory hungry query
simultaneously, consume a lot of work_mem, and cause a problem. Then by
the time you notice the problem and start collecting information, they are
done and things are back to normal.
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Claudio Freire | 2018-01-23 22:59:23 | Re: 8.2 Autovacuum BUG ? |
Previous Message | Laurenz Albe | 2018-01-23 18:36:32 | Re: need help on memory allocation |