Re: Failing to allocate memory when I think it shouldn't

From: Thomas Ziegler <thomas(dot)ziegler(at)holmsecurity(dot)com>
To: Christoph Moench-Tegeder <cmt(at)burggraben(dot)net>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Failing to allocate memory when I think it shouldn't
Date: 2024-09-17 07:40:15
Message-ID: f8abb93a-5b03-4c0e-a69f-1f3cdfd4c4d4@holmsecurity.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello Christoph,

Thanks for your answer and the suggestions, it already helped me out a lot!

On 2024-09-14 22:11, Christoph Moench-Tegeder wrote:
> Hi,
>
> ## Thomas Ziegler (thomas(dot)ziegler(at)holmsecurity(dot)com):
>
> There's a lot of information missing here. Let's start from the top.
>
>> I have had my database killed by the kernel oom-killer. After that I
>> set turned off memory over-committing and that is where things got weird.
> What exactly did you set? When playing with vm.overcommit, did you
> understand "Committed Address Space" and the workings of the
> overcommit accounting? This is the document:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/mm/overcommit-accounting.rst
> Hint: when setting overcommit_memory=2 you might end up with way
> less available adress space than you thought you would. Also keep
> an eye on /proc/meminfo - it's sometimes hard to estimate "just off
> your cuff" what's in memory and how it's mapped. (Also, anything
> else on that machine which might hog memory?).

I set overcommit_memory=2, but completely missed 'overcommit_ratio'.
That is most probably why the database got denied the RAM a lot sooner
than I expected.

> Finally, there's this:
>> 2024-09-12 05:18:36.073 UTC [1932776] LOG: background worker "parallel worker" (PID 3808076) exited with exit code 1
>> terminate called after throwing an instance of 'std::bad_alloc'
>> what(): std::bad_alloc
>> 2024-09-12 05:18:36.083 UTC [1932776] LOG: background worker "parallel worker" (PID 3808077) was terminated by signal 6: Aborted
> That "std::bad_alloc" sounds a lot like C++ and not like the C our
> database is written in. My first suspicion would be that you're using
> LLVM-JIT (unless you have other - maybe even your own - C++ extensions
> in the database?) and that in itself can use a good chunk of memory.
> And it looks like that exception bubbled up as a signal 6 (SIGABRT)
> which made the process terminate immediately without any cleanup,
> and after that the server has no other chance than to crash-restart.

Except for pgAudit, I don't have any extensions, so it is probably the
JIT. I had no idea there was a JIT, even it should have been obvious.
Thanks for pointing this out!

Is the memory the JIT takes limited by 'work_mem' or will it just take
as much memory as it needs?

> I recommend starting with understanding the actual memory limits
> as set by your configuration (personally I believe that memory
> overcommit is less evil than some people think). Have a close look
> at /proc/meminfo and if possible disable JIT and check if it changes
> anything. Also if possible try starting with only a few active
> connections and increase load carefully once a steady state (in
> terms of memory usage) has been reached.

Yes, understanding the memory limits is what I was trying to do.
I was questioning my understanding but it seems it was Linux that
tripped me,
or more like my lack of understanding there, rather than the database.
Memory management and /proc/meminfo still manages to confuse me.

Again, thanks for your help!

Cheers,
Thomas

p.s.: To anybody who stumbles upon this in the future,
if you set `overcommit_memory=2`, don't forget `overcommit_ratio`.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alvaro Herrera 2024-09-17 08:28:45 Re: update faster way
Previous Message Muhammad Usman Khan 2024-09-17 05:55:24 Re: load fom csv