Re: Out of Memory errors are frustrating as heck!

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Gunther <raj(at)gusw(dot)net>, pgsql-performance(at)lists(dot)postgresql(dot)org, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: Out of Memory errors are frustrating as heck!
Date: 2019-04-20 23:46:35
Message-ID: 20190420234635.lzbihj3she4btao6@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sat, Apr 20, 2019 at 06:20:15PM -0400, Tom Lane wrote:
>Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
>> On Sat, Apr 20, 2019 at 04:46:03PM -0400, Tom Lane wrote:
>>> Maybe we just need to account for the per-batch buffers while estimating
>>> the amount of memory used during planning. That would force this case
>>> into a mergejoin instead, given that work_mem is set so small.
>
>> Do you mean by adding disable_cost if work_mem is so small that it's estimated
>> to be exceeded ?
>
>No, my point is that ExecChooseHashTableSize fails to consider the
>I/O buffers at all while estimating hash table size. It's not
>immediately obvious how to factor that in, but we should.
>
>If Tomas is right that there's also an underestimate of the number
>of rows here, that might not solve Gunther's immediate problem; but
>it seems like a clear costing oversight.
>
>There's also the angle that the runtime code acts as though increasing
>the number of batches is free, while it clearly isn't when you think
>about the I/O buffers. So at some point we should probably stop
>increasing the number of batches on the grounds of needing too many
>buffers.

Yes. I think it might be partially due to the cost being hidden elsewhere.
The hashjoin code only really deals with array of pointers to BufFile, not
with the BufFiles. And might have looked insignificant for common cases,
but clearly for these corner cases it matters quite a bit.

So yes, ExecChooseHashTableSize() needs to consider this memory and check
if doubling the number of batches has any chance of actually improving
things, because at some point the BufFile memory starts to dominate and
would just force us to do more and more batches.

But I think we should also consider this before even creating the hash
join path - see if the expected number of batches has any chance of
fitting into work_mem, and if not then just not create the path at all.
Just like we do for hash aggregate, for example. It's not going to solve
cases like this (with underestimates), but it seems reasonable. Although,
maybe we won't actually use such paths, because merge join will win thanks
to being automatically cheaper? Not sure.

Also, I wonder if we really need 8kB buffers here. Would it make sense to
allow smaller buffers in some cases? Say, 1kB. It's not going to save us,
but it's still 8x better than now.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Gunther 2019-04-21 00:33:46 Re: Out of Memory errors are frustrating as heck!
Previous Message Tom Lane 2019-04-20 22:20:15 Re: Out of Memory errors are frustrating as heck!