Re: Out of Memory errors are frustrating as heck!

From: Gunther <raj(at)gusw(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gunther <raj(at)gusw(dot)net>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Subject: Re: Out of Memory errors are frustrating as heck!
Date: 2019-04-15 03:59:45
Message-ID: 2256ca91-9dac-1fe1-6b23-02c899f8395f@gusw.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 4/14/2019 23:24, Tom Lane wrote:
>> ExecutorState: 2234123384 total in 266261 blocks; 3782328 free (17244 chunks); 2230341056 used
> Oooh, that looks like a memory leak right enough. The ExecutorState
> should not get that big for any reasonable query.
2.2 GB is massive yes.
> Your error and stack trace show a failure in HashBatchContext,
> which is probably the last of these four:
>
>> HashBatchContext: 57432 total in 3 blocks; 16072 free (6 chunks); 41360 used
>> HashBatchContext: 90288 total in 4 blocks; 16072 free (6 chunks); 74216 used
>> HashBatchContext: 90288 total in 4 blocks; 16072 free (6 chunks); 74216 used
>> HashBatchContext: 100711712 total in 3065 blocks; 7936 free (0 chunks); 100703776 used
> Perhaps that's more than it should be, but it's silly to obsess over 100M
> when there's a 2.2G problem elsewhere.
Yes.
> I think it's likely that it was
> just coincidence that the failure happened right there. Unfortunately,
> that leaves us with no info about where the actual leak is coming from.

Strange though, that the vmstat tracking never showed that the cache
allocated memory goes much below 6 GB. Even if this 2.2 GB memory leak
is there, and even if I had 2 GB of shared_buffers, I would still have
enough for the OS to give me.

Is there any doubt that this might be a problem with Linux? Because if
you want, I can whip out a FreeBSD machine, compile pgsql, and attach
the same disk, and try it there. I am longing to have a reason to move
back to FreeBSD anyway. But I have tons of stuff to do, so if you do not
have reason to suspect Linux to do wrong here, I prefer skipping that
futile attempt

> The memory map shows that there were three sorts and four hashes going
> on, so I'm not sure I believe that this corresponds to the query plan
> you showed us before.
Like I said, the first explain was not using the same constraints (no
NL). Now what I sent last should all be consistent. Memory dump and
explain plan and gdb backtrace.
> Any chance of extracting a self-contained test case that reproduces this?

With 18 million rows involved in the base tables, hardly.

But I am ready to try some other things with the debugger that you want
me to try. If we have a memory leak issue, we might just as well try to 
plug it!

I could even to give someone of you access to the system that runs this.

thanks,
-Gunther

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2019-04-15 04:15:00 Re: Out of Memory errors are frustrating as heck!
Previous Message Tom Lane 2019-04-15 03:24:59 Re: Out of Memory errors are frustrating as heck!