Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
Date: 2014-05-17 18:41:37
Message-ID: 5377AD61.70707@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17.5.2014 19:55, Tom Lane wrote:
> Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
>> ... then of course the usual 'terminating connection because of crash of
>> another server process' warning. Apparently, it's getting killed by the
>> OOM killer, because it exhausts all the memory assigned to that VM (2GB).
>
> Can you fix things so it runs into its process ulimit before the OOM killer
> triggers? Then we'd get a memory map dumped to stderr, which would be
> helpful in localizing the problem.

I did this in /etc/security/limits.d/80-pgbuild.conf:

pgbuild hard as 1835008

so the user the buildfarm runs under will have up to ~1.75GB of RAM (of
the 2GB available to the container).

>
>> ... So this seems like a
>> memory leak somewhere in the cache invalidation code.
>
> Smells that way to me too, but let's get some more evidence.

The tests are already running, and there are a few postgres processes:

PID VIRT RES %CPU TIME+ COMMAND
11478 449m 240m 100.0 112:53.57 postgres: pgbuild regression [local]
CREATE VIEW
11423 219m 19m 0.0 0:00.17 postgres: checkpointer process
11424 219m 2880 0.0 0:00.05 postgres: writer process
11425 219m 5920 0.0 0:00.12 postgres: wal writer process
11426 219m 2708 0.0 0:00.05 postgres: autovacuum launcher process
11427 79544 1836 0.0 0:00.17 postgres: stats collector process
11479 1198m 1.0g 0.0 91:09.99 postgres: pgbuild regression [local]
CREATE INDEX waiting

Attached is 'pmap -x' output for the two interesting processes (11478,
11479).

Tomas

Attachment Content-Type Size
11478.pmap.log text/x-log 9.6 KB
11479.pmap.log text/x-log 9.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-05-17 19:29:02 Re: %d in log_line_prefix doesn't work for bg/autovacuum workers
Previous Message Andrew Dunstan 2014-05-17 17:58:27 Re: buildfarm animals and 'snapshot too old'