Re: PostgreSQL 17 Segmentation Fault

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Cameron Vogt <cvogt(at)automaticcontrols(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Sean Massey <sean(dot)f(dot)massey(at)gmail(dot)com>
Cc: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: PostgreSQL 17 Segmentation Fault
Date: 2024-10-04 19:11:15
Message-ID: 82ffe6ad-a219-4255-9fea-daff619a1ca0@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

Thanks for the provided information. Per the backtrace, the failure
happens in the LLVM JIT code in nestloop/seqscan, so it has to be in
this part of the plan:

-> Nested Loop (cost=0.42..6074.84 rows=117 width=641)
-> Parallel Seq Scan on tasks__projects (cost=0.00..2201.62
rows=745 width=16)
Filter: (gid = '1138791545416725'::text)
-> Index Scan using tasks_pkey on tasks tasks_1
(cost=0.42..5.20 rows=1 width=102)
Index Cond: (gid = tasks__projects._sdc_source_key_gid)
Filter: ((NOT completed) AND (name <> ''::text))

But it's not clear why this should consume a lot of memory, though. It's
possible the memory is consumed elsewhere, and this is simply the straw
that breaks the camel's back ...

Presumably it takes a while for the query to consume a lot of memory and
crash - can you attach a debugger to it after after it allocates a lot
of memory (but before the crash), and do this:

call MemoryContextStats(TopMemoryContext)

That should write memory context stats to the server log. Perhaps that
will tell us which part of the query allocates memory.

Next, try running the query with jit=off. If that resolves the problem,
maybe it's another JIT issue. But if it completes with lower shared
buffers, that doesn't seem likely.

The plan has a bunch of hash joins. I wonder if that might be causing
issues, because the hash tables may be kept until the end of the query,
and each may be up to 64MB (you have work_mem=32, but there's also 2x
multiplier since PG13). The row estimates are pretty low, but could it
be that the real row counts are much higher? Did you run analyze after
the upgrade? Maybe try with lower work_mem?

One last thing you should check is memory overcommit. Chances are it's
set just low enough for the query to hit it with SB=4GB, but not with
SB=3GB. In that case you may need to tune this a bit. See /proc/meminfo
and /proc/sys/vm/overcommit_*).

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Cameron Vogt 2024-10-04 22:17:47 Re: PostgreSQL 17 Segmentation Fault
Previous Message Cameron Vogt 2024-10-04 18:26:03 Re: PostgreSQL 17 Segmentation Fault