Re: Random memory related errors on live postgres 14.13 instance on Ubuntu 22.04 LTS

From: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>
To: Ian J Cottee <ian(at)cottee(dot)org>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Random memory related errors on live postgres 14.13 instance on Ubuntu 22.04 LTS
Date: 2024-11-03 21:27:49
Message-ID: CAM+6J97Xiq5KfqC1ZrJrUuASqOPJzMCKqiTRvjW21Y=f+T3zGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sat, 2 Nov 2024 at 12:50, Ian J Cottee <ian(at)cottee(dot)org> wrote:
> As the previous errors (thankfully) are not showing now I can't really do any more debugging but I'll report back on the results of the memtest.

just for the sake of this thread, i wanted to make a mention of

Summary of Errors reported early on... (

Stuck Spinlock Detected: This error indicates that a thread is unable
to acquire a lock, which can happen due to contention or a deadlock
situation. It often points to performance issues or bugs in the
PostgreSQL codebase.
Free(): Corrupted Unsorted Chunks: This message typically arises from
memory management issues, particularly when the memory allocator
detects corruption in its internal structures. This could be
indicative of bugs in the application code or PostgreSQL itself.
Double Free or Corruption (!prev): This error occurs when the program
attempts to free memory that has already been freed or is corrupted.
It can lead to crashes and is often linked to improper memory handling
in the application.
Corrupted Size vs. Prev_Size: Similar to the previous errors, this
suggests that there is a mismatch in expected memory sizes, which can
be caused by buffer overflows or improper memory allocation.
Corrupted Double-Linked List: This indicates that the internal data
structures used for managing memory are corrupted, often resulting
from improper memory handling.
Stack Smashing Detected: This error occurs when a program writes more
data to a buffer than it can hold, corrupting adjacent memory. It’s a
common sign of buffer overflow vulnerabilities.
Segmentation Fault: A segmentation fault indicates that the program
tried to access an invalid memory location, leading to a crash. This
is often a result of programming errors such as dereferencing null
pointers or accessing out-of-bounds array elements.

yes, i used perplexity to understand these errors and asked for what
scenarios do we get those errors, it may not be all correct, or may
even be wrong, which i'll have dive more in later.
but if these are not hardware errors, then there can be serious bugs
which can be exploited for vulnerabilities ... around buffer
overflows.

if you have a standard installation from ubuntu package binaries ....
i think there might be some lower level C code in functions? or
extensions also in play which can lead to above errors.i hope there
was a core dumped somewhere and a stacktrace might have some info on
what led to those errors could help understand if there is unsafe code
somewhere and what queries resulted in that.

I understand this forum is not to discuss AI responses ... but I also
did not want to ignore them due to lack of knowledge... hence sharing.
as always, i can be corrected or wronged

In response to

Browse pgsql-general by date

  From Date Subject
Next Message ma lz 2024-11-04 09:52:12 Why not do distinct before SetOp
Previous Message Koen De Groote 2024-11-03 13:59:55 Re: pg_wal folder high disk usage