From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Subject: | suspicious lockup on widowbird in AdvanceXLInsertBuffer (could it be due to 6a2275b8953?) |
Date: | 2025-02-26 22:08:57 |
Message-ID: | 67f7132d-3923-47a6-9de2-5b7d86ddb73f@vondra.me |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I have noticed one of my buildfarm machines - widowbird - did not report
any results since February 17. And it seems to be stuck somewhere in
amcheck:
$ ps ax | grep postgres
1180067 ? Ss 0:02
/mnt/data/buildfarm/buildroot/HEAD/inst/bin/postgres -D data-C
1180069 ? Ss 0:00 postgres: checkpointer
1180070 ? Ss 0:00 postgres: background writer
1180072 ? Ss 0:00 postgres: walwriter
1180073 ? Ss 0:01 postgres: autovacuum launcher
1180074 ? Ss 0:00 postgres: logical replication launcher
1180107 ? Ss 0:05 postgres: buildfarm
contrib_regression_amcheck [local] INSERT
1180111 ? Ss 0:00 postgres: autovacuum worker
1180134 ? Ss 0:00 postgres: autovacuum worker
1180135 ? Ss 0:00 postgres: autovacuum worker
1374029 pts/0 S+ 0:00 grep --color=auto postgres
So there's PID 1180107, executing an insert, but not progressing. The
backtrace looks like this (first couple lines, full backtrace attached):
#0 0x0000007fa64b8ddc in __GI_epoll_pwait (epfd=5, events=0x55ad6285a8,
maxevents=1, timeout=timeout(at)entry=-1, set=set(at)entry=0x0) at
../sysdeps/unix/sysv/linux/epoll_pwait.c:42
#1 0x0000007fa64b8fe8 in epoll_wait (epfd=<optimized out>,
events=<optimized out>, maxevents=<optimized out>, timeout=timeout(at)entry=-1)
at ../sysdeps/unix/sysv/linux/epoll_wait.c:32
#2 0x000000558f043588 in WaitEventSetWaitBlock (nevents=1,
occurred_events=0x7ff8ed4e18, cur_timeout=-1, set=0x55ad628540) at
latch.c:1571
#3 WaitEventSetWait (set=0x55ad628540, timeout=timeout(at)entry=-1,
occurred_events=occurred_events(at)entry=0x7ff8ed4e18,
nevents=nevents(at)entry=1,
wait_event_info=wait_event_info(at)entry=134217781) at latch.c:1519
#4 0x000000558f043778 in WaitLatch (latch=<optimized out>,
wakeEvents=wakeEvents(at)entry=33, timeout=timeout(at)entry=-1,
wait_event_info=wait_event_info(at)entry=134217781)
at latch.c:538
#5 0x000000558f052274 in ConditionVariableTimedSleep (cv=0x7f9ac9deb0,
timeout=timeout(at)entry=-1,
wait_event_info=wait_event_info(at)entry=134217781) at condition_variable.c:163
#6 0x000000558f05286c in ConditionVariableTimedSleep
(wait_event_info=134217781, timeout=-1, cv=<optimized out>) at
condition_variable.c:135
#7 0x000000558ed2fc90 in AdvanceXLInsertBuffer
(upto=upto(at)entry=608174080, tli=tli(at)entry=1,
opportunistic=opportunistic(at)entry=false) at xlog.c:2224
So, it's stuck in AdvanceXLInsertBuffer ... interesting. Another
interesting fact is it's testing 75dfde13639, which is just a couple
commits after 6a2275b895:
commit 6a2275b8953a4462d44daf001bdd60b3d48f0946
Author: Alexander Korotkov <akorotkov(at)postgresql(dot)org>
Date: Mon Feb 17 04:19:01 2025 +0200
Get rid of WALBufMappingLock
Allow multiple backends to initialize WAL buffers concurrently.
This way `MemSet((char *) NewPage, 0, XLOG_BLCKSZ);` can run in
parallel without taking a single LWLock in exclusive mode.
...
which reworked AdvanceXLInsertBuffer() quite a bit, it seems. OTOH the
last (successful) run on widorbird was on eaf502747b, which already
includes 6a2275b895, so maybe it's unrelated.
Is there something else I could collect from the stuck instance, before
I restart it?
regards
--
Tomas Vondra
Attachment | Content-Type | Size |
---|---|---|
widowbird.log | text/x-log | 5.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2025-02-26 22:13:14 | Re: suspicious lockup on widowbird in AdvanceXLInsertBuffer (could it be due to 6a2275b8953?) |
Previous Message | Tom Lane | 2025-02-26 21:58:16 | Re: Anti join confusion |