Re: Question about memory usage

From: Preston Hagar <prestonh(at)gmail(dot)com>
To: "pgsql-general(at)postgresql(dot)org General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Question about memory usage
Date: 2014-01-10 20:33:46
Message-ID: CAK6zN=0FPTRGz-3vPX4vZ-WwCu=EMUYBTvSoxhptuZtVSsdhGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Jan 10, 2014 at 12:19 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Preston Hagar <prestonh(at)gmail(dot)com> writes:
>> >>> tl;dr: Moved from 8.3 to 9.3 and are now getting out of memory errors
>> >>> despite the server now having 32 GB instead of 4 GB of RAM and the
>> workload
>> >>> and number of clients remaining the same.
>>
>> > Here are a couple of examples from the incident we had this morning:
>> > 2014-01-10 06:14:40 CST 30176 LOG: could not fork new process for
>> > connection: Cannot allocate memory
>> > 2014-01-10 06:14:40 CST 30176 LOG: could not fork new process for
>> > connection: Cannot allocate memory
>>
>> That's odd ... ENOMEM from fork() suggests that you're under system-wide
>> memory pressure.
>>
>> > [ memory map dump showing no remarkable use of memory at all ]
>> > 2014-01-10 06:18:46 CST 10.1.1.6 16669 [unknown] production
>> > 10.1.1.6(36680)ERROR: out of memory
>> > 2014-01-10 06:18:46 CST 10.1.1.6 16669 [unknown] production
>> > 10.1.1.6(36680)DETAIL: Failed on request of size 500.
>>
>> I think that what you've got here isn't really a Postgres issue, but
>> a system-level configuration issue: the kernel is being unreasonably
>> stingy about giving out memory, and it's not clear why.
>>
>> It might be worth double-checking that the postmaster is not being
>> started under restrictive ulimit settings; though offhand I don't
>> see how that theory could account for fork-time failures, since
>> the ulimit memory limits are per-process.
>>
>> Other than that, you need to burrow around in the kernel settings
>> and see if you can find something there that's limiting how much
>> memory it will give to Postgres. It might also be worth watching
>> the kernel log when one of these problems starts. Plain old "top"
>> might also be informative as to how much memory is being used.
>>
>
Thanks for the response. I think it might have been the lack of a
swapfile (I replied as such in another response)

> That said, we have been using this site as a guide to try to figure things
> out about postgres and memory:
>
> http://www.depesz.com/2012/06/09/how-much-ram-is-postgresql-using/
>
> we came up with the following for all our current processes (we aren't out
> of memory and new connections are being accepted right now, but memory
> seems low)
>
> 1. List of RSS usage for all postgres processes:
>
> http://pastebin.com/J7vy846k
>
> 2. List of all memory segments for postgres checkpoint process (pid 30178)
>
> grep -B1 -E '^Size: *[0-9]{6}' /proc/30178/smaps
> 7f208acec000-7f2277328000 rw-s 00000000 00:04 31371473
> /dev/zero (deleted)
> Size: 8067312 kB
>
> 3. Info on largest memory allocation for postgres checkpoint process. It
> is using 5GB of RAM privately.
>
> cat /proc/30178/smaps | grep 7f208acec000 -B 0 -A 20
>
> Total RSS: 11481148
> 7f208acec000-7f2277328000 rw-s 00000000 00:04 31371473
> /dev/zero (deleted)
> Size: 8067312 kB
> Rss: 5565828 kB
> Pss: 5284432 kB
> Shared_Clean: 0 kB
> Shared_Dirty: 428840 kB
> Private_Clean: 0 kB
> Private_Dirty: 5136988 kB
> Referenced: 5559624 kB
> Anonymous: 0 kB
> AnonHugePages: 0 kB
> Swap: 0 kB
> KernelPageSize: 4 kB
> MMUPageSize: 4 kB
> Locked: 0 kB
> 7f2277328000-7f22775f1000 r--p 00000000 09:00 2889301
> /usr/lib/locale/locale-archive
> Size: 2852 kB
> Rss: 8 kB
> Pss: 0 kB
> Shared_Clean: 8 kB
> Shared_Dirty: 0 kB
>
> If I am understanding all this correctly, the postgres checkpoint process
> has around 5GB of RAM "Private_Dirty" allocated (not shared buffers). Is
> this normal? Any thoughts as to why this would get so high?
>
> I'm still trying to dig in further to figure out exactly. We are running
> on Ubuntu 12.04.3 (Kernel 3.5.0-44). We set vm.overcommit_memory = 2 but
> didn't have a swap partition we have since added one and are seeing if that
> helps.
>
>
>
>>
>>
> >> We had originally copied our shared_buffers, work_mem, wal_buffers and
>> >> other similar settings from our old config, but after getting the
>> memory
>> >> errors have tweaked them to the following:
>> >
>> > shared_buffers = 7680MB
>> > temp_buffers = 12MB
>> > max_prepared_transactions = 0
>> > work_mem = 80MB
>> > maintenance_work_mem = 1GB
>> > wal_buffers = 8MB
>> > max_connections = 350
>>
>> That seems like a dangerously large work_mem for so many connections;
>> but unless all the connections were executing complex queries, which
>> doesn't sound to be the case, that isn't the immediate problem.
>>
>>
> Thanks for the heads up. We had come about the value originally using
> pgtune and I think 250 connections and I forgot to lower work_mem when I
> upped the connections. I now have it set to 45 MB, does that seem more
> reasonable?
>
>
>
>
>> >> The weird thing is that our old server had 1/8th the RAM, was set to
>> >> max_connections = 600 and had the same clients connecting in the same
>> way
>> >> to the same databases and we never saw any errors like this in the
>> several
>> >> years we have been using it.
>>
>> This reinforces the impression that something's misconfigured at the
>> kernel level on the new server.
>>
>> regards, tom lane
>>
>
>
Forgot to copy the list on the reply, so I am here.

> Thanks for your help and time.
>
> Preston
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2014-01-10 20:37:02 Re: round(real,integer)
Previous Message Paul Ramsey 2014-01-10 20:10:29 round(real,integer)