Re: Out of Memory errors are frustrating as heck!

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gunther <raj(at)gusw(dot)net>, pgsql-performance(at)lists(dot)postgresql(dot)org, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: Out of Memory errors are frustrating as heck!
Date: 2019-04-20 20:11:57
Message-ID: 20190420201157.m3ybatfcaplz6lur@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sat, Apr 20, 2019 at 02:30:09PM -0500, Justin Pryzby wrote:
>On Sun, Apr 14, 2019 at 11:24:59PM -0400, Tom Lane wrote:
>> Gunther <raj(at)gusw(dot)net> writes:
>> > ExecutorState: 2234123384 total in 266261 blocks; 3782328 free (17244 chunks); 2230341056 used
>>
>> Oooh, that looks like a memory leak right enough. The ExecutorState
>> should not get that big for any reasonable query.
>
>On Tue, Apr 16, 2019 at 11:30:19AM -0400, Tom Lane wrote:
>> Hmm ... this matches up with a vague thought I had that for some reason
>> the hash join might be spawning a huge number of separate batches.
>> Each batch would have a couple of files with associated in-memory
>> state including an 8K I/O buffer, so you could account for the
>
>On Tue, Apr 16, 2019 at 10:24:53PM -0400, Gunther wrote:
>> -> Hash (cost=2861845.87..2861845.87 rows=34619 width=74) (actual time=199792.446..199792.446 rows=113478127 loops=1)
>> Buckets: 65536 (originally 65536) Batches: 131072 (originally 2) Memory Usage: 189207kB
>
>Is it significant that there are ~2x as many ExecutorState blocks as there are
>batches ? 266261/131072 => 2.03...
>

IMO that confirms this is the issue with BufFile I just described, because
the struct is >8K, so it's allocated as a separate block (it exceeds the
threshold in AllocSet). And we have two BufFile(s) for each batch, because
we need to batch both the inner and outer relations.

>If there was 1 blocks leaked when batch=2, and 2 blocks leaked when batch=4,
>and 4 blocks leaked when batch=131072, then when batch=16, there'd be 64k
>leaked blocks, and 131072 total blocks.
>
>I'm guessing Tom probably already thought of this, but:
>2230341056/266261 => ~8376

Well, the BufFile is 8272 on my system, so that's pretty close ;-)

>which is pretty close to the 8kB I/O buffer you were talking about (if the
>number of same-sized buffers much greater than other allocations).
>
>If Tom thinks (as I understand) that the issue is *not* a memory leak, but out
>of control increasing of nbatches, and failure to account for their size...then
>this patch might help.
>
>The number of batches is increased to avoid exceeding work_mem. With very low
>work_mem (or very larger number of tuples hashed), it'll try to use a large
>number of batches. At some point the memory used by BatchFiles structure
>(increasing by powers of two) itself exceeds work_mem.
>
>With larger work_mem, there's less need for more batches. So the number of
>batches used for small work_mem needs to be constrained, either based on
>work_mem, or at all.
>
>With my patch, the number of batches is nonlinear WRT work_mem, and reaches a
>maximum for moderately small work_mem. The goal is to choose the optimal
>number of batches to minimize the degree to which work_mem is exceeded.
>

Yeah. The patch might be enough for debugging, but it's clearly not
something we could adopt as is, because we increase the number of batches
for a reason - we need to do that to keep the amount of memory needed for
the hash table contents (i.e. rows) below work_mem. If you just cap the
number of batches, you'll keep the amount of memory for BufFile under
control, but the hash table may exceed work_mem.

Considering how rare this issue likely is, we need to be looking for a
solution that does not break the common case.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2019-04-20 20:26:34 Re: Out of Memory errors are frustrating as heck!
Previous Message Tomas Vondra 2019-04-20 20:01:34 Re: Out of Memory errors are frustrating as heck!