Hi Dmitry,

thanks for looking into this.

Maybe it is a combination of JIT and some other postgres config changes we have in our environment?
I will try to reproduce with a blank config and only change the JIT settings.

This is where the source is <> default:

name	setting	unit
autovacuum_analyze_scale_factor	0.03
autovacuum_max_workers	6
autovacuum_naptime	300	s
autovacuum_vacuum_insert_scale_factor	0.05
autovacuum_vacuum_scale_factor	0.03
autovacuum_vacuum_threshold	1000
client_connection_check_interval	30000	ms
default_text_search_config	pg_catalog.english
dynamic_shared_memory_type	posix
effective_cache_size	1048576	8kB
enable_partitionwise_aggregate	on
enable_partitionwise_join	on
hash_mem_multiplier	1.5
jit	on
jit_above_cost	1
jit_inline_above_cost	1
jit_optimize_above_cost	1
listen_addresses	*
log_destination	jsonlog
log_file_mode	640
log_lock_waits	on
log_rotation_size	102400	kB
log_timezone	Etc/UTC
logging_collector	on
maintenance_work_mem	1048576	kB
max_connections	150
max_locks_per_transaction	1024
max_parallel_workers	8
max_parallel_workers_per_gather	2
max_wal_size	2048	MB
min_wal_size	80	MB
random_page_cost	1
shared_buffers	786432	8kB
TimeZone	Etc/UTC
work_mem	512000	kB

The docker container has a 6gb shm_size.

Let me know if there is anything else I can provide to get this resolved.

Von: Dmitry Dolgov <9erthalion6@gmail.com>
Gesendet: 21.05.2024 18:08
An: <joachim.haecker-becker@arcor.de>,<pgsql-bugs@lists.postgresql.org>
Betreff: Re: BUG #18471: Possible JIT memory leak resulting in signal 11:Segmentation fault on ARM

> On Fri, May 17, 2024 at 01:13:06PM +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference: 18471
> Logged by: Joachim Haecker-Becker
> Email address: joachim.haecker-becker@arcor.de
> PostgreSQL version: 16.3
> Operating system: Debian Bookworm
> Description:
>
> We have a reproducible way to force a postgres process to consume more and
> more RAM until it crashes on ARM.
> The same works on X86 without any issue.
> With jit=off it runs on ARM as well.
>
> We run into this situation in a real-life database situation with a lot of
> joins and aggregate functions.
> The following code is just a mock to reproduce a similar situation without
> needing access to our real data.
> This issue blocks us from upgrading or ARM-hosted databases into something
> newer than 14.7.

I think it would be useful to know how much memory difference are we
talking about and, just to make everything clear, how exactly postgres
crashes (OOM kill I assume)? It's important to differentiate between the
case "ARM with jit crashes, ARM without jit doesn't" and "ARM with jit
crashes, ARM without jit crashes with even more columns" (the same goes
for x86).

I've tried to reproduce it on an arm64 VM (16.3 build with llvm 17), and
although I could observe some difference in memory consumption between
JIT on/off, but it wasn't huge (around 10% or so). Running it under
valgrind shows only complains about memory allocated for bitcode
modules, which is expected -- as far as I recall postgres is somewhat
wasteful when it comes to allocating memory for those modules, even more
so for parallel workers. This is the case here, where there is growing
number of parallel hash workers. This would not explain any difference
from x86 of course, but there might be different baseline memory
consumption for different architectures.