Re: Server crash with parallel workers with Postgres 14.7

From: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
To: José Lorenzo Urdaneta Rodriguez <lorenzo(at)kronor(dot)io>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Server crash with parallel workers with Postgres 14.7
Date: 2023-05-30 04:56:09
Message-ID: CAJKUy5jCaxACx8hfaHmazeTKYhGjVErQzftgY_N4T30FRYRAzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, May 29, 2023 at 10:38 AM José Lorenzo Urdaneta Rodriguez
<lorenzo(at)kronor(dot)io> wrote:
>
> I just wanted to confirm this was the right place to report the issue. Can anyone confirm, please?
>

yes, this is the right place to report... only there is no guaranted
SLA and because this report is not that useful (read below for
details) that makes a lot of people not follow

> On Fri, 19 May 2023 at 11:14, José Lorenzo Urdaneta Rodriguez <lorenzo(at)kronor(dot)io> wrote:
>>
>> Hi,
>>
>> I've been having intermittent server crashes when executing certain queries. I have narrowed the cases to queries that scan large tables, and the most recent cases when the planner uses parallel workers.
>>

intermittent means is not reproducible all times? I mean, just
executing this query does not cause the crash?

>> I managed to collect a core dump of the crash, here's the result of `bt` using `gdb`:
>>
>> ```
>> Reading symbols from /usr/lib/postgresql/14/bin/postgres...
>> Reading symbols from /usr/lib/debug/.build-id/4a/4ff1b11a45a428e502b992679932bc188f92c1.debug...
>> [New LWP 3008897]
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
>> Core was generated by `postgres: 14/kronor: parallel worker for PID 3008825 '.
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> #0 0x0000fffea2ac7a68 in ?? ()
>> (gdb) bt
>> #0 0x0000fffea2ac7a68 in ?? ()
>> #1 0x0000aaaabb378020 in ExecProcNode (node=0xaaaae311d068) at ./build/../src/include/executor/executor.h:257
>> #2 ExecAppend (pstate=0xaaaae30dd358) at ./build/../src/backend/executor/nodeAppend.c:360
>> #3 0x0000aaaabb378020 in ExecProcNode (node=0xaaaae30dd358) at ./build/../src/include/executor/executor.h:257
>> #4 ExecAppend (pstate=0xaaaae30bf258) at ./build/../src/backend/executor/nodeAppend.c:360
>> #5 0x0000000000000001 in ?? ()
>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>> ```
>>

this backtrace doesn't have all debug symbols, did you install the
postgresql-14-dbgsym module?
without the names of the functions we don't really know what is happening.

also, have you installed any extensions? you can execute "\dx" on psql
to see what extensions are installed (remember that extensions are
installed by database so executing that commando on only one database
is not enough).

>> The query that was running was:

- a big query goes here - the query itself is not useful if you don't
provide the table structues and a minimal amount of data (fake data)
to make the problem appear

>> The plan for this query was:
>>
- A typical plan for a partitioned table -

>> JIT:
>> Functions: 375
>> Options: Inlining false, Optimization false, Expressions true, Deforming true
>> ```
>>

the backtrace says this is a segmentation fault, but anyway I will
suggest deactivate JIT before the query: just "SET jit TO off;" should
be enough
and try to cause the problem again, JIT is known to have a leak memory
problem (which is not consistent with a segmentation fault, but who
knows)

>> Operating System: Ubuntu 20
>> Architecture: aarch64
>> Server version: 14.7
>>

try to update to v14.8 which has some fixes on it

--
Jaime Casanova
Consultores de PostgreSQL
SYSTEMGUARDS S.A.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2023-05-30 07:07:58 BUG #17951: hashtext('input') returning non-integer value for certain inputs
Previous Message Ba Jinsheng 2023-05-30 03:19:28 Suspicious Estimated Number of Returned Rows