Re: terminate called after throwing an instance of 'std::bad_alloc' (llvmjit)

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: terminate called after throwing an instance of 'std::bad_alloc' (llvmjit)
Date: 2021-11-18 21:20:39
Message-ID: 20211118212039.GT17618@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 10, 2021 at 09:56:44AM -0600, Justin Pryzby wrote:
> Thread starting here:
> https://www.postgresql.org/message-id/20201001021609.GC8476%40telsasoft.com
>
> On Fri, Dec 18, 2020 at 05:56:07PM -0600, Justin Pryzby wrote:
> > I'm 99% sure the "bad_alloc" is from LLVM. It happened multiple times on
> > different servers (running a similar report) after setting jit=on during pg13
> > upgrade, and never happened since re-setting jit=off.
>
> Since this recurred a few times recently (now running pg14.0), and I finally
> managed to get a non-truncated corefile...

I think the reason this recurred is that, since upgrading to pg14, I no longer
had your memleak patches applied. I'd forgotten about it, but was probably
running a locally compiled postgres with your patches applied.

I should've mentioned that this crash was associated with the message from the
original problem report:

|terminate called after throwing an instance of 'std::bad_alloc'
| what(): std::bad_alloc

The leak discussed on other threads seems fixed by your patches - I compiled
v14 and now running with no visible leaks since last week.
https://www.postgresql.org/message-id/flat/20210417021602(dot)7dilihkdc7oblrf7(at)alap3(dot)anarazel(dot)de

As I understand it, there's still an issue with an allocation failure causing
SIGABRT rather than FATAL.

It took me several tries to get the corefile since the process is huge, caused
by the leak (and abrtd wanted to truncate it, nullifying its utility).

-rw-------. 1 postgres postgres 8.4G Nov 10 08:57 /var/lib/pgsql/14/data/core.31345

I installed more debug packages to get a fuller stacktrace.

#0 0x00007f2497880337 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007f2497881a28 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00007f2487cbf265 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
No symbol table info available.
#3 0x00007f2487c66696 in __cxxabiv1::__terminate(void (*)()) () from /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
No symbol table info available.
#4 0x00007f2487c666c3 in std::terminate() () from /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
No symbol table info available.
#5 0x00007f2487c687d3 in __cxa_throw () from /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
No symbol table info available.
#6 0x00007f2487c686cd in operator new(unsigned long) () from /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
No symbol table info available.
#7 0x00007f2486477b9c in allocateBuckets (this=0x2ff7f38, this=0x2ff7f38, Num=<optimized out>) at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:753
No locals.
#8 llvm::DenseMap<llvm::APInt, std::unique_ptr<llvm::ConstantInt, std::default_delete<llvm::ConstantInt> >, llvm::DenseMapAPIntKeyInfo, llvm::detail::DenseMapPair<llvm::APInt, std::unique_ptr<llvm::ConstantInt, std::default_delete<llvm::ConstantInt> > > >::grow (this=this(at)entry=0x2ff7f38, AtLeast=<optimized out>)
at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:691
OldNumBuckets = 33554432
OldBuckets = 0x7f23f3e42010
#9 0x00007f2486477f29 in grow (AtLeast=<optimized out>, this=0x2ff7f38) at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:461
No locals.
#10 InsertIntoBucketImpl<llvm::APInt> (TheBucket=<optimized out>, Lookup=..., Key=..., this=0x2ff7f38) at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:510
NewNumEntries = <optimized out>
EmptyKey = <optimized out>
#11 InsertIntoBucket<llvm::APInt const&> (Key=..., TheBucket=<optimized out>, this=0x2ff7f38) at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:471
No locals.
#12 FindAndConstruct (Key=..., this=0x2ff7f38) at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:271
TheBucket = <optimized out>
#13 operator[] (Key=..., this=0x2ff7f38) at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:275
No locals.
#14 llvm::ConstantInt::get (Context=..., V=...) at /usr/src/debug/llvm-5.0.1.src/lib/IR/Constants.cpp:550
pImpl = 0x2ff7eb0
#15 0x00007f2486478263 in llvm::ConstantInt::get (Ty=0x2ff85a8, V=<optimized out>, isSigned=isSigned(at)entry=false) at /usr/src/debug/llvm-5.0.1.src/lib/IR/Constants.cpp:571
No locals.
#16 0x00007f248648673d in LLVMConstInt (IntTy=<optimized out>, N=<optimized out>, SignExtend=SignExtend(at)entry=0) at /usr/src/debug/llvm-5.0.1.src/lib/IR/Core.cpp:952
No locals.
#17 0x00007f2488f66c18 in l_ptr_const (type=0x3000650, ptr=<optimized out>) at ../../../../src/include/jit/llvmjit_emit.h:29
c = <optimized out>
#18 llvm_compile_expr (state=<optimized out>) at llvmjit_expr.c:246
op = 0x1a5317690
opcode = EEOP_OUTER_VAR
opno = 5
parent = <optimized out>
funcname = 0x1a53184e8 "evalexpr_4827_151"
context = 0x1ba79b8
b = <optimized out>
mod = 0x1a5513d30
eval_fn = <optimized out>
entry = <optimized out>
v_state = 0x1a5ce09e0
v_econtext = 0x1a5ce0a08
v_isnullp = 0x1a5ce0a30
v_tmpvaluep = 0x1a5ce0aa8
v_tmpisnullp = 0x1a5ce0b48
starttime = {tv_sec = 10799172, tv_nsec = 781670770}
endtime = {tv_sec = 7077194792, tv_nsec = 0}
__func__ = "llvm_compile_expr"
[...]

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2021-11-18 21:33:13 Re: Time to drop plpython2?
Previous Message Robert Haas 2021-11-18 21:13:50 Re: Mixing CC and a different CLANG seems like a bad idea