Re: Question about memory usage

From: "Tomas Vondra" <tv(at)fuzzy(dot)cz>
To: "Preston Hagar" <prestonh(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Question about memory usage
Date: 2014-01-10 19:40:51
Message-ID: 934878c3cbe285d13efc2781518c6459.squirrel@sq.gransy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 10 Leden 2014, 19:19, Tom Lane wrote:
> Preston Hagar <prestonh(at)gmail(dot)com> writes:
>>>> tl;dr: Moved from 8.3 to 9.3 and are now getting out of memory errors
>>>> despite the server now having 32 GB instead of 4 GB of RAM and the
>>>> workload
>>>> and number of clients remaining the same.
>
>> Here are a couple of examples from the incident we had this morning:
>> 2014-01-10 06:14:40 CST 30176 LOG: could not fork new process for
>> connection: Cannot allocate memory
>> 2014-01-10 06:14:40 CST 30176 LOG: could not fork new process for
>> connection: Cannot allocate memory
>
> That's odd ... ENOMEM from fork() suggests that you're under system-wide
> memory pressure.
>
>> [ memory map dump showing no remarkable use of memory at all ]
>> 2014-01-10 06:18:46 CST 10.1.1.6 16669 [unknown] production
>> 10.1.1.6(36680)ERROR: out of memory
>> 2014-01-10 06:18:46 CST 10.1.1.6 16669 [unknown] production
>> 10.1.1.6(36680)DETAIL: Failed on request of size 500.
>
> I think that what you've got here isn't really a Postgres issue, but
> a system-level configuration issue: the kernel is being unreasonably
> stingy about giving out memory, and it's not clear why.
>
> It might be worth double-checking that the postmaster is not being
> started under restrictive ulimit settings; though offhand I don't
> see how that theory could account for fork-time failures, since
> the ulimit memory limits are per-process.
>
> Other than that, you need to burrow around in the kernel settings
> and see if you can find something there that's limiting how much
> memory it will give to Postgres. It might also be worth watching
> the kernel log when one of these problems starts. Plain old "top"
> might also be informative as to how much memory is being used.

My bet is on overcommit - what are

vm.overcommit_memory
vm.overcommit_ratio

set to? Do you have a swap or no? I've repeatedly ran into very similar
OOM issues on machines with overcommit disabled (overcommit_memory=2) and
with no swap. There was plenty of RAM available (either free or in page
cache) but in case of sudden peak the allocations failed. Also
vm.swappiness seems to play a role in this.

>>> The weird thing is that our old server had 1/8th the RAM, was set to
>>> max_connections = 600 and had the same clients connecting in the same
>>> way
>>> to the same databases and we never saw any errors like this in the
>>> several
>>> years we have been using it.

Chances are the old machine had swap, overcommit and/or higher swappiness,
so it was not running into these issues with overcommit.

Anyway, I see you've mentioned shmmax/shmall in one of your previous
messages. I'm pretty sure that's irrelevant to the problem, because that
only affects allocation of shared buffers (i.e. shared memory). But if the
database starts OK, the cause is somewhere else.

kind regards
Tomas Vondra

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Paul Ramsey 2014-01-10 20:10:29 round(real,integer)
Previous Message Jeff Janes 2014-01-10 18:59:56 Re: Consistent file-level backup of pg data directory