BUG #16990: Random PANIC in qemu user context

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: pguyot(at)kallisys(dot)net
Subject: BUG #16990: Random PANIC in qemu user context
Date: 2021-05-02 08:42:51
Message-ID: 16990-10b586bc699fd234@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 16990
Logged by: Paul Guyot
Email address: pguyot(at)kallisys(dot)net
PostgreSQL version: 11.11
Operating system: qemu-arm-static chrooted raspios inside ubuntu
Description:

Within GitHub Actions Workflow, a qemu chrooted environment is created from
a RaspiOS lite image, within which latest availble postgresql is installed
from apt (postgresql 11.11).
Then tests of embedded software are executed, which includes creating a
postgresql database and performing few benign operations (as far as
PostgreSQL is concerned). Tests run perfectly fine in a desktop-like
environment as well as on real devices.

Within this qemu context, randomly yet quite frequently, postgresql
PANICs.
Latest log was the following :
2021-05-02 09:22:21.591 BST [15024] PANIC: stuck spinlock detected at
LWLockWaitListLock,
/build/postgresql-11-rRyn74/postgresql-11-11.11/build/../src/backend/storage/lmgr/lwlock.c:832
qemu: uncaught target signal 6 (Aborted) - core dumped
2021-05-02 09:22:21.597 BST [15022] PANIC: stuck spinlock detected at
LWLockWaitListLock,
/build/postgresql-11-rRyn74/postgresql-11-11.11/build/../src/backend/storage/lmgr/lwlock.c:832
qemu: uncaught target signal 6 (Aborted) - core dumped
2021-05-02 09:22:21.762 BST [15423] pynab(at)test_pynab PANIC: stuck spinlock
detected at LWLockWaitListLock,
/build/postgresql-11-rRyn74/postgresql-11-11.11/build/../src/backend/storage/lmgr/lwlock.c:832
2021-05-02 09:22:21.762 BST [15423] pynab(at)test_pynab STATEMENT: SELECT
"django_content_type"."id", "django_content_type"."app_label",
"django_content_type"."model" FROM "django_content_type" WHERE
"django_content_type"."app_label" = 'auth'
qemu: uncaught target signal 6 (Aborted) - core dumped
2021-05-02 09:22:24.481 BST [15011] LOG: server process (PID 15423) was
terminated by signal 6: Aborted
2021-05-02 09:22:24.481 BST [15011] DETAIL: Failed process was running:
SELECT "django_content_type"."id", "django_content_type"."app_label",
"django_content_type"."model" FROM "django_content_type" WHERE
"django_content_type"."app_label" = 'auth'
2021-05-02 09:22:24.481 BST [15011] LOG: terminating any other active
server processes
2021-05-02 09:22:24.567 BST [15011] LOG: all server processes terminated;
reinitializing
2021-05-02 09:22:24.601 BST [15512] LOG: database system was interrupted;
last known up at 2021-05-02 09:18:11 BST
2021-05-02 09:22:24.692 BST [15512] LOG: database system was not properly
shut down; automatic recovery in progress
2021-05-02 09:22:24.699 BST [15512] LOG: redo starts at 0/171E170
2021-05-02 09:22:25.045 BST [15512] LOG: invalid record length at
0/1957948: wanted 24, got 0
2021-05-02 09:22:25.046 BST [15512] LOG: redo done at 0/1957910
2021-05-02 09:22:25.048 BST [15512] LOG: last completed transaction was at
log time 2021-05-02 09:20:04.917746+01
2021-05-02 09:22:25.096 BST [15011] LOG: database system is ready to accept
connections

The log is publicly available here :
https://github.com/pguyot/pynab/runs/2485660214?check_suite_focus=true

Notice how sluggish the test is compared to when PostgreSQL doesn't PANIC,
with the same environment. For example, this run worked perfectly under 20
minutes:
https://github.com/pguyot/pynab/runs/2483559259?check_suite_focus=true

I tried to update CI script to upload the full raspbian image in case of
panics to get my hands on the core dump, but it's so sluggish I'm not sure
it will not timeout eventually. I wonder if this sluggishness is not a cause
of the PANIC. Could you please advise about how to investigate further this
crash?

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Korotkov 2021-05-02 16:26:29 Re: BUG #16986: reindex error on ltree index
Previous Message Brar Piening 2021-05-02 06:53:02 Re: BUG #16988: Spurious "SET LOCAL can only be used in transaction blocks" warning using implicit transaction block