Quick Links

AW: Re: BUG #18471: Possible JIT memory leak resulting in signal 11:Segmentation fault on ARM

From:	joachim(dot)haecker-becker(at)arcor(dot)de
To:	"Dmitry Dolgov" <9erthalion6(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	AW: Re: BUG #18471: Possible JIT memory leak resulting in signal 11:Segmentation fault on ARM
Date:	2024-05-22 06:40:07
Message-ID:	673ce6ea298c4f989a34d63e1d13a8fa@arcor.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

<html><head></head><body><div style="font-family: arial,helvetica,sans-serif; font-size: 12px;"><title></title>Hi Dmitry, thanks for looking into this.  Maybe it is a combination of JIT and some other postgres config changes we have in our environment? I will try to reproduce with a blank config and only change the JIT settings. This is where the source is <> default:  <table border="0" cellpadding="0" cellspacing="0" width="438"><tbody><tr height="19"><td height="19" width="258">name</td><td width="116">setting</td><td width="64">unit</td></tr><tr height="19"><td height="19">autovacuum_analyze_scale_factor</td><td align="right">0.03</td><td> </td></tr><tr height="19"><td height="19">autovacuum_max_workers</td><td align="right">6</td><td> </td></tr><tr height="19"><td height="19">autovacuum_naptime</td><td align="right">300</td><td>s</td></tr><tr height="19"><td height="19">autovacuum_vacuum_insert_scale_factor</td><td align="right">0.05</td><td> </td></tr><tr height="19"><td height="19">autovacuum_vacuum_scale_factor</td><td align="right">0.03</td><td> </td></tr><tr height="19"><td height="19">autovacuum_vacuum_threshold</td><td align="right">1000</td><td> </td></tr><tr height="19"><td height="19">client_connection_check_interval</td><td align="right">30000</td><td>ms</td></tr><tr height="19"><td height="19">default_text_search_config</td><td>pg_catalog.english</td><td> </td></tr><tr height="19"><td height="19">dynamic_shared_memory_type</td><td>posix</td><td> </td></tr><tr height="19"><td height="19">effective_cache_size</td><td align="right">1048576</td><td>8kB</td></tr><tr height="19"><td height="19">enable_partitionwise_aggregate</td><td>on</td><td> </td></tr><tr height="19"><td height="19">enable_partitionwise_join</td><td>on</td><td> </td></tr><tr height="19"><td height="19">hash_mem_multiplier</td><td align="right">1.5</td><td> </td></tr><tr height="19"><td height="19">jit</td><td>on</td><td> </td></tr><tr height="19"><td height="19">jit_above_cost</td><td align="right">1</td><td> </td></tr><tr height="19"><td height="19">jit_inline_above_cost</td><td align="right">1</td><td> </td></tr><tr height="19"><td height="19">jit_optimize_above_cost</td><td align="right">1</td><td> </td></tr><tr height="19"><td height="19">listen_addresses</td><td>*</td><td> </td></tr><tr height="19"><td height="19">log_destination</td><td>jsonlog</td><td> </td></tr><tr height="19"><td height="19">log_file_mode</td><td align="right">640</td><td> </td></tr><tr height="19"><td height="19">log_lock_waits</td><td>on</td><td> </td></tr><tr height="19"><td height="19">log_rotation_size</td><td align="right">102400</td><td>kB</td></tr><tr height="19"><td height="19">log_timezone</td><td>Etc/UTC</td><td> </td></tr><tr height="19"><td height="19">logging_collector</td><td>on</td><td> </td></tr><tr height="19"><td height="19">maintenance_work_mem</td><td align="right">1048576</td><td>kB</td></tr><tr height="19"><td height="19">max_connections</td><td align="right">150</td><td> </td></tr><tr height="19"><td height="19">max_locks_per_transaction</td><td align="right">1024</td><td> </td></tr><tr height="19"><td height="19">max_parallel_workers</td><td align="right">8</td><td> </td></tr><tr height="19"><td height="19">max_parallel_workers_per_gather</td><td align="right">2</td><td> </td></tr><tr height="19"><td height="19">max_wal_size</td><td align="right">2048</td><td>MB</td></tr><tr height="19"><td height="19">min_wal_size</td><td align="right">80</td><td>MB</td></tr><tr height="19"><td height="19">random_page_cost</td><td align="right">1</td><td> </td></tr><tr height="19"><td height="19">shared_buffers</td><td align="right">786432</td><td>8kB</td></tr><tr height="19"><td height="19">TimeZone</td><td>Etc/UTC</td><td> </td></tr><tr height="19"><td height="19">work_mem</td><td align="right">512000</td><td>kB</td></tr></tbody></table> The docker container has a 6gb shm_size. Let me know if there is anything else I can provide to get this resolved.  <div class="replyHeader" style="line-height:20px;padding:5px;border-top:1px solid #dfdfdf;">Von: Dmitry Dolgov <9erthalion6(at)gmail(dot)com> Gesendet: 21.05.2024 18:08 An: <joachim(dot)haecker-becker(at)arcor(dot)de>,<pgsql-bugs(at)lists(dot)postgresql(dot)org> Betreff: Re: BUG #18471: Possible JIT memory leak resulting in signal 11:Segmentation fault on ARM</div> <div>> On Fri, May 17, 2024 at 01:13:06PM +0000, PG Bug reporting form wrote: > The following bug has been logged on the website: > > Bug reference: 18471 > Logged by: Joachim Haecker-Becker > Email address: <a href="mailto:joachim(dot)haecker-becker(at)arcor(dot)de">joachim(dot)haecker-becker(at)arcor(dot)de</a> > PostgreSQL version: 16.3 > Operating system: Debian Bookworm > Description: > > We have a reproducible way to force a postgres process to consume more and > more RAM until it crashes on ARM. > The same works on X86 without any issue. > With jit=off it runs on ARM as well. > > We run into this situation in a real-life database situation with a lot of > joins and aggregate functions. > The following code is just a mock to reproduce a similar situation without > needing access to our real data. > This issue blocks us from upgrading or ARM-hosted databases into something > newer than 14.7. I think it would be useful to know how much memory difference are we talking about and, just to make everything clear, how exactly postgres crashes (OOM kill I assume)? It's important to differentiate between the case "ARM with jit crashes, ARM without jit doesn't" and "ARM with jit crashes, ARM without jit crashes with even more columns" (the same goes for x86). I've tried to reproduce it on an arm64 VM (16.3 build with llvm 17), and although I could observe some difference in memory consumption between JIT on/off, but it wasn't huge (around 10% or so). Running it under valgrind shows only complains about memory allocated for bitcode modules, which is expected -- as far as I recall postgres is somewhat wasteful when it comes to allocating memory for those modules, even more so for parallel workers. This is the case here, where there is growing number of parallel hash workers. This would not explain any difference from x86 of course, but there might be different baseline memory consumption for different architectures.</div></div></body></html>

Attachment	Content-Type	Size
unknown_filename	text/html	6.8 KB

Responses

Re: Re: BUG #18471: Possible JIT memory leak resulting in signal 11:Segmentation fault on ARM at 2024-05-22 18:22:12 from Clemens Eisserer

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Rahul Pandey	2024-05-22 11:37:17	Re: [EXTERNAL] Re: Windows Application Issues \| PostgreSQL \| REF # 48475607
Previous Message	Tender Wang	2024-05-22 03:46:52	Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943