Re: server crash on raspberry pi for large queries

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Matthew Clark <mclark(at)drmatthewclark(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: server crash on raspberry pi for large queries
Date: 2024-08-21 02:38:02
Message-ID: CAApHDvoSLgz_WLK8XKNbkLAQqk-BRva0+CdR26Rp=7vXnVAycw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, 21 Aug 2024 at 07:18, Matthew Clark <mclark(at)drmatthewclark(dot)com> wrote:
> I can try to make a test case, essentially a large table, then "select count(*)" from table. The select works for smaller tables.

It would be good to figure out which instruction is being executed
that's causing this. Would you be able to attach with gdb and trigger
the crash? [1]. I think gdb should print out the problem instruction.

Looking at master, I see we call LLVMGetHostCPUFeatures() to figure
this stuff out. I've not yet looked to see if that's changed since
PG15. If we knew the instruction that's being executed here then we
might be able to figure out if it's down to cpuid advertising
something that the CPU supports that isn't fully supported (maybe
unlikely?) or if it's LLVM that's accidentally emitting code that does
not work on the CPU.

Does it also trigger if you enable jit but do: "set
jit_optimize_above_cost = -1;", maybe the problem instruction is only
emitted at higher optimisation levels.

Thomas mentioned to me that he has seen issues in this area before,
albeit with x86 on a Celeron [2] when LLVM emitted an unsupported AVX.

David

[1] https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD
[2] https://www.postgresql.org/message-id/CAEepm%3D1oLBeRjGw9RS6n%3Du0fE4t0WZMMawcfJopkmTmxRoefGw%40mail.gmail.com

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Eisentraut 2024-08-21 06:14:23 Re: CREATE CAST allows creation of binary-coercible cast to range over domain
Previous Message Tomas Vondra 2024-08-20 21:47:30 Re: FDW INSERT batching can change behavior