Re: Segfault in jit tuple deforming on arm64 due to LLVM issue

From: Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Segfault in jit tuple deforming on arm64 due to LLVM issue
Date: 2024-08-26 14:16:41
Message-ID: CAO6_XqqFuE7eo1kq58eieF4UGYFe89KD0Uab4UxVNFsk1-HqgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 26, 2024 at 4:33 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> IIUC this one is a random and rare crash depending on malloc() and
> perhaps also the working size of your virtual memory dart board.
> (Annoyingly, I had tried to reproduce this quite a few times on small ARM
> systems when earlier reports came in, d'oh!).

allocateMappedMemory used when creating sections will eventually call
mmap[1], not malloc. So the amount of shared memory configured may be
a factor in triggering the issue.

My first attempts to reproduce the issue from scratch weren't
successful either. However, trying again with different values of
shared_buffers, I've managed to trigger the issue somewhat reliably.

On a clean Ubuntu jammy, I've compiled the current PostgreSQL
REL_14_STABLE (6bc2bfc3) with the following options:
CLANG=clang-14 ../configure --enable-cassert --enable-debug --prefix
~/.local/ --with-llvm

Set "shared_buffers = '4GB'" in the configuration. More may be needed
but 4GB was enough for me.

Create a table with multiple partitions with pgbench. The goal is to
have a jit module big enough to trigger the issue.
pgbench -i --partitions=64

Then run the following query with jit forcefully enabled:
psql options=-cjit_above_cost=0 -c 'SELECT count(bid) from pgbench_accounts;'

If the issue was successfully triggered, it should segfault or be
stuck in an infinite loop.

> Ultimately, if it doesn't work, and doesn't get fixed, it's hard for
> us to do much about it. But hmm, this is probably madness... I wonder
> if it would be feasible to detect address span overflow ourselves at a
> useful time, as a kind of band-aid defence...

There's a possible alternative, but it's definitely in the same
category as the hot-patching idea. llvmjit uses
LLVMOrcCreateRTDyldObjectLinkingLayerWithSectionMemoryManager to
create the ObjectLinkingLayer and it will be created with the default
SectionMemoryManager[2]. It should be possible to provide a modified
SectionMemoryManager with the change to allocate sections in a single
block and it could be restricted to arm64 architecture. A part of me
tells me this is probably a bad idea but on the other hand, LLVM
provides this way to plug a custom allocator and it would fix the
issue...

[1] https://github.com/llvm/llvm-project/blob/release/14.x/llvm/lib/Support/Unix/Memory.inc#L115-L117
[2] https://github.com/llvm/llvm-project/blob/release/14.x/llvm/lib/ExecutionEngine/Orc/OrcV2CBindings.cpp#L967-L973

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-08-26 14:19:06 Re: Removing log_cnt from pg_sequence_read_tuple()
Previous Message David E. Wheeler 2024-08-26 14:06:59 Re: RFC: Additional Directory for Extensions