Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
Date: 2014-05-17 20:33:31
Message-ID: 5377C79B.7040708@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17.5.2014 21:31, Andres Freund wrote:
> On 2014-05-17 20:41:37 +0200, Tomas Vondra wrote:
>> On 17.5.2014 19:55, Tom Lane wrote:
>>> Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
>> The tests are already running, and there are a few postgres processes:
>>
>> PID VIRT RES %CPU TIME+ COMMAND
>> 11478 449m 240m 100.0 112:53.57 postgres: pgbuild regression [local]
>> CREATE VIEW
>> 11423 219m 19m 0.0 0:00.17 postgres: checkpointer process
>> 11424 219m 2880 0.0 0:00.05 postgres: writer process
>> 11425 219m 5920 0.0 0:00.12 postgres: wal writer process
>> 11426 219m 2708 0.0 0:00.05 postgres: autovacuum launcher process
>> 11427 79544 1836 0.0 0:00.17 postgres: stats collector process
>> 11479 1198m 1.0g 0.0 91:09.99 postgres: pgbuild regression [local]
>> CREATE INDEX waiting
>>
>> Attached is 'pmap -x' output for the two interesting processes (11478,
>> 11479).
>
> Could you gdb -p 11479 into the process and issue 'p
> MemoryContextStats(TopMemoryContext)'. That should print information
> about the server's allocation to its stderr.

That process already finished, but I've done that for another process
(that had ~400MB allocated, and was growing steadily - about 1MB / 10
seconds). It was running a SELECT query and already completed.

Then it executed ANALYZE, and I took several snapshots of it - not sure
how much memory it had at the beginning (maybe ~250MB), then when it had
~350MB, 400MB, 500MB and 600MB. It's still running, not sure how much
will it grow.

Anyway, the main difference between the analyze snapshot seems to be this:

init: CacheMemoryContext: 67100672 total in 17 blocks; ...
350MB: CacheMemoryContext: 134209536 total in 25 blocks; ...
400MB: CacheMemoryContext: 192929792 total in 32 blocks; ...
500MB: CacheMemoryContext: 293593088 total in 44 blocks; ...
600MB: CacheMemoryContext: 411033600 total in 58 blocks; ...

Not sure if there's something wrong with the SELECT memory context. It
has ~1500 of nested nodes like these:

SQL function data: 24576 total in 2 blocks; ...
ExecutorState: 24576 total in 2 blocks; ...
SQL function data: 24576 total in 2 blocks; ...
ExprContext: 8192 total in 1 blocks; ...

But maybe it's expected / OK.

regards
Tomas

Attachment Content-Type Size
select.log.gz application/gzip 83.1 KB
analyze.log text/x-log 3.0 KB
analyze-350MB.log text/x-log 2.9 KB
analyze-400MB.log text/x-log 2.9 KB
analyze-500MB.log text/x-log 2.9 KB
analyze-600MB.log text/x-log 2.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-05-17 20:34:04 Re: %d in log_line_prefix doesn't work for bg/autovacuum workers
Previous Message Tom Lane 2014-05-17 20:23:26 Re: %d in log_line_prefix doesn't work for bg/autovacuum workers