From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com> |
Cc: | Jelte Fennema <Jelte(dot)Fennema(at)microsoft(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Don't clean up LLVM state when exiting in a bad way |
Date: | 2021-09-14 04:00:00 |
Message-ID: | 1343a091-013a-2652-9c00-ff48612eca4b@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello hackers,
14.09.2021 04:32, Andres Freund wrote:
> On 2021-09-07 14:44:39 -0500, Justin Pryzby wrote:
>> On Tue, Sep 07, 2021 at 12:27:27PM -0700, Andres Freund wrote:
>>> I think this is a tad too strong. We should continue to clean up on exit as
>>> long as the error didn't happen while we're already inside llvm
>>> code. Otherwise we loose some ability to find leaks. How about checking in the
>>> error path whether fatal_new_handler_depth is > 0, and skipping cleanup in
>>> that case? Because that's precisely when it should be unsafe to reenter
>>> LLVM.
> The more important reason is actually profiling information that needs to be
> written out.
>
> I've now pushed a fix to all relevant branches. Thanks all!
>
I've encountered similar issue last week, but found this discussion only
after the commit.
I'm afraid that it's not completely gone yet. I've reproduced a similar
crash (on edb4d95d) with
echo "statement_timeout = 50
jit_optimize_above_cost = 1
jit_inline_above_cost = 1
parallel_setup_cost=0
parallel_tuple_cost=0
" >/tmp/extra.config
TEMP_CONFIG=/tmp/extra.config make check
parallel group (11 tests): memoize explain hash_part partition_info
reloptions tuplesort compression partition_aggregate indexing
partition_prune partition_join
partition_join ... FAILED (test process exited with
exit code 2) 1815 ms
partition_prune ... FAILED (test process exited with
exit code 2) 1779 ms
reloptions ... ok 146 ms
I've extracted the crash-causing fragment from the partition_prune test
to reproduce the segfault reliably (see the patch attached).
The segfault stack is:
Core was generated by `postgres: parallel worker for PID
12029 '.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f045e0a88ca in notifyFreed (K=<optimized out>, Obj=...,
this=<optimized out>)
at
/usr/src/debug/llvm-7.0.1.src/lib/ExecutionEngine/Orc/OrcCBindingsStack.h:485
485 Listener->NotifyFreeingObject(Obj);
(gdb) bt
#0 0x00007f045e0a88ca in notifyFreed (K=<optimized out>, Obj=...,
this=<optimized out>)
at
/usr/src/debug/llvm-7.0.1.src/lib/ExecutionEngine/Orc/OrcCBindingsStack.h:485
#1 operator() (K=<optimized out>, Obj=..., __closure=<optimized out>)
at
/usr/src/debug/llvm-7.0.1.src/lib/ExecutionEngine/Orc/OrcCBindingsStack.h:226
#2 std::_Function_handler<void (unsigned long, llvm::object::ObjectFile
const&),
llvm::OrcCBindingsStack::OrcCBindingsStack(llvm::TargetMachine&,
std::function<std::unique_ptr<llvm::orc::IndirectStubsManager,
std::default_delete<llvm::orc::IndirectStubsManager> >
()>)::{lambda(unsigned long, llvm::object::ObjectFile
const&)#3}>::_M_invoke(std::_Any_data const&, unsigned long,
llvm::object::ObjectFile const&) (__functor=..., __args#0=<optimized
out>, __args#1=...)
at /usr/include/c++/4.8.2/functional:2071
#3 0x00007f045e0aa578 in operator() (__args#1=..., __args#0=<optimized
out>, this=<optimized out>)
at /usr/include/c++/4.8.2/functional:2471
...
The corresponding code in OrcCBindingsStack.h is:
void notifyFreed(orc::VModuleKey K, const object::ObjectFile &Obj) {
for (auto &Listener : EventListeners)
Listener->NotifyFreeingObject(Obj);
}
So probably one of the EventListeners has become null. I see that
without debugging and profiling enabled the only listener registration
in the postgres code is LLVMOrcRegisterJITEventListener.
With LLVM 9 on the same Centos 7 I don't get such segfault. Also it
doesn't happen on different OSes with LLVM 7. I still have no
explanation for that, but maybe there is difference between LLVM
configure options, e.g. like this:
https://stackoverflow.com/questions/47712670/segmentation-fault-in-llvm-pass-when-using-registerstandardpasses
Best regards,
Alexander
Attachment | Content-Type | Size |
---|---|---|
jit-llvm-7-crash.sql | application/sql | 1.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2021-09-14 04:02:49 | Re: Added schema level support for publication. |
Previous Message | Sehrope Sarkuni | 2021-09-14 03:56:52 | Re: Add jsonlog log_destination for JSON server logs |