Re: Segfault in jit tuple deforming on arm64 due to LLVM issue

From: Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Segfault in jit tuple deforming on arm64 due to LLVM issue
Date: 2024-08-27 09:24:20
Message-ID: CAO6_XqqMf=4NqCENneqY8wd=TyrrYXfLepKcrGWNocWoQp_Prg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 27, 2024 at 1:33 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> I am sure this requires changes for various LLVM versions. I tested
> it with LLVM 14 on a Mac where I've never managed to reproduce the
> original complaint, but ... ooooh, this might be exacerbated by ASLR,
> and macOS only has a small ALSR slide window (16M or 256M apparently,
> according to me in another thread), so I'd probably have to interpose
> my own mmap() to choose some more interesting addresses, or run some
> other OS, but that's quite enough rabbit holes for one morning.

I've tested the patch. I had to make sure the issue was triggered on
master first. The issue didn't happen with 4GB shared_buffers and 64
partitions. However, increasing to 6GB and 128 partitions triggered
the issue.

The architecture check in the patch was incorrect (__arch64__ instead
of __aarch64__, glad to see I'm not the only one being confused with
aarch64 and arm64 :)) but once fixed, it worked and avoided the
segfault.

I've run some additional tests to try to test different parameters:
- I've tried disabling randomize_va_space, the issue still happened
even with ASLR disabled.
- I've tested different PG versions. With 14 and 15, 4GB and 64
partitions were enough. Starting PG 16, I had to increase
shared_buffers to 6GB and partitions to 128. I've been able to trigger
the issue on all versions from 14 to master (which was expected but I
wanted confirmation)
- I haven't been able to reproduce this on a macOS either. I've tried
to remove MemGroup.Near hint so mmap addresses would be more random
and played with different shared_buffers and partition values without
success

I've modified the patch with 3 changes:
- meson.build was using SectionMemoryManager.cpp file name, I've
replaced with SafeSectionMemoryManager.cpp
- Use __aarch64__ instead of __arch64__
- Moved the architecture switch to llvm_create_object_layer and go
through the normal
LLVMOrcCreateRTDyldObjectLinkingLayerWithSectionMemoryManager on non
arm64 architectures. There's no need to use the custom memory manager
for non arm64 so it looked better to avoid it entirely if there's no
need for the reserve allocation.

Attachment Content-Type Size
v3-0001-XXX-LLVM-ARM-relocation-bug-mitigation.patch application/octet-stream 29.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2024-08-27 09:26:04 Re: list of acknowledgments for PG17
Previous Message Etsuro Fujita 2024-08-27 09:21:09 Re: Cross-version Compatibility of postgres_fdw