BUG #17257: (auto)vacuum hangs within lazy_scan_prune()

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: exclusion(at)gmail(dot)com
Subject: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Date: 2021-10-29 13:00:03
Message-ID: 17257-1e46de26bec11433@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 17257
Logged by: Alexander Lakhin
Email address: exclusion(at)gmail(dot)com
PostgreSQL version: 14.0
Operating system: Ubuntu 20.04
Description:

When running concurrent installchecks (x100) I've observed the autovacuum
process hanging:

tester 328713 328706 0 12:19 ? 00:02:03 postgres: logical
replication launcher
tester 3301788 328706 0 16:33 ? 00:00:02 postgres: postgres
regress030 127.0.0.1(44534) CREATE TABLE
tester 3302076 328706 99 16:33 ? 05:17:10 postgres: autovacuum
worker regress030
tester 3311424 328706 0 16:34 ? 00:03:09 postgres: postgres
regress090 127.0.0.1(48744) ALTER TABLE
tester 3311453 328706 1 16:34 ? 00:03:12 postgres: postgres
regress048 127.0.0.1(48768) ALTER TABLE
tester 3311461 328706 1 16:34 ? 00:03:11 postgres: postgres
regress017 127.0.0.1(48774) DROP TABLE
tester 3311490 328706 0 16:34 ? 00:03:09 postgres: postgres
regress006 127.0.0.1(48804) ALTER TABLE
tester 3311493 328706 0 16:34 ? 00:03:08 postgres: postgres
regress008 127.0.0.1(48806) ALTER TABLE

gdb showed the following stacktrace:
0x0000564d6c93cd1f in GetPrivateRefCountEntry (buffer=<optimized out>,
do_move=<optimized out>) at bufmgr.c:312
312 Assert(BufferIsValid(buffer));
(gdb) bt
#0 0x0000564d6c93cd1f in GetPrivateRefCountEntry (buffer=<optimized out>,
do_move=<optimized out>) at bufmgr.c:312
#1 0x0000564d6c93fb79 in GetPrivateRefCount (buffer=111422) at
bufmgr.c:398
#2 BufferGetBlockNumber (buffer=buffer(at)entry=111422) at bufmgr.c:2771
#3 0x0000564d6b83519d in heap_prune_chain (prstate=0x7fff07f03e10,
rootoffnum=7, buffer=111422) at pruneheap.c:625
#4 heap_page_prune (relation=relation(at)entry=0x7fa60a9cff80,
buffer=buffer(at)entry=111422,
vistest=vistest(at)entry=0x564d710816a0 <GlobalVisCatalogRels>,
old_snap_xmin=old_snap_xmin(at)entry=0,
old_snap_ts=old_snap_ts(at)entry=0, report_stats=report_stats(at)entry=false,
off_loc=<optimized out>) at pruneheap.c:278
#5 0x0000564d6b84a7d4 in lazy_scan_prune (vacrel=0x625000044190,
buf=<optimized out>, blkno=<optimized out>,
page=0x7fa642bdd580 <incomplete sequence \334>, vistest=<optimized out>,
prunestate=<optimized out>)
at vacuumlazy.c:1741
#6 0x0000564d6b8597cd in lazy_scan_heap (aggressive=<optimized out>,
params=<optimized out>, vacrel=<optimized out>)
at vacuumlazy.c:1384
#7 heap_vacuum_rel (rel=<optimized out>, params=<optimized out>,
bstrategy=<optimized out>) at vacuumlazy.c:638
#8 0x0000564d6bf9b6b6 in table_relation_vacuum (bstrategy=<optimized out>,
params=0x6250000272c4, rel=0x7fa60a9cff80)
at ../../../src/include/access/tableam.h:1678
#9 vacuum_rel (relid=<optimized out>, relation=<optimized out>,
params=0x6250000272c4) at vacuum.c:2034
#10 0x0000564d6bfa11f0 in vacuum (relations=0x625000041278,
relations(at)entry=0x6250000552b0,
params=params(at)entry=0x6250000272c4, bstrategy=<optimized out>,
bstrategy(at)entry=0x625000053048,
isTopLevel=isTopLevel(at)entry=true) at vacuum.c:475
#11 0x0000564d6c6e3a37 in autovacuum_do_vac_analyze
(bstrategy=0x625000053048, tab=0x6250000272c0) at autovacuum.c:3242
#12 do_autovacuum () at autovacuum.c:2490
#13 0x0000564d6c6e6681 in AutoVacWorkerMain (argv=0x0, argc=0) at
autovacuum.c:1714
#14 0x0000564d6c6e6a95 in StartAutoVacWorker () at autovacuum.c:1499
#15 0x0000564d6c7272d4 in StartAutovacuumWorker () at postmaster.c:5561
#16 sigusr1_handler (postgres_signal_arg=<optimized out>) at
postmaster.c:5266
#17 <signal handler called>
#18 __GI___sigprocmask (how=2, set=<optimized out>, oset=0x0) at
../sysdeps/unix/sysv/linux/sigprocmask.c:39
#19 0x0000564d6b52efc4 in __interceptor_sigprocmask.part.0 ()
#20 0x0000564d6c727d9e in ServerLoop () at postmaster.c:1707
#21 0x0000564d6c72c309 in PostmasterMain (argc=argc(at)entry=3,
argv=argv(at)entry=0x603000000250) at postmaster.c:1417
#22 0x0000564d6b4f6995 in main (argc=3, argv=0x603000000250) at main.c:209

(gdb) b 1820
Breakpoint 1 at 0x564d6b84b378: file vacuumlazy.c, line 1820.
(gdb) cont
Continuing.

Breakpoint 1, lazy_scan_prune (vacrel=0x625000044190, buf=<optimized out>,
blkno=<optimized out>,
page=0x7fa642bdd580 <incomplete sequence \334>, vistest=<optimized out>,
prunestate=<optimized out>)
at vacuumlazy.c:1820
1820 goto retry;

(gdb) print *vacrel
$15 = {rel = 0x7fa60a9cff80, indrels = 0x625000043b70, nindexes = 5,
failsafe_active = false,
consider_bypass_optimization = true, do_index_vacuuming = true,
do_index_cleanup = true, do_rel_truncate = true,
bstrategy = 0x625000053048, lps = 0x0, old_rel_pages = 6, old_live_tuples
= 133, relfrozenxid = 726, relminmxid = 1,
OldestXmin = 48155004, FreezeLimit = 4293122300, MultiXactCutoff =
4289973311,
relnamespace = 0x625000044140 "pg_catalog", relname = 0x625000044168
"pg_constraint", indname = 0x0, blkno = 11,
offnum = 32, phase = VACUUM_ERRCB_PHASE_SCAN_HEAP, dead_tuples =
0x62a0000a8240, rel_pages = 13, scanned_pages = 12,
pinskipped_pages = 0, frozenskipped_pages = 0, tupcount_pages = 12,
pages_removed = 0, lpdead_item_pages = 9,
nonempty_pages = 11, new_rel_tuples = 0, new_live_tuples = 0, indstats =
0x6250000444d8, num_index_scans = 0,
tuples_deleted = 9, lpdead_items = 226, new_dead_tuples = 0, num_tuples =
237, live_tuples = 237}

I've also caught the similar hang on (auto)vacuuming pg_attrdef,
pg_shdepend, pg_class.

It seems that this eternal retry is caused by the HEAPTUPLE_RECENTLY_DEAD
state of a tuple processed in the lazy_scan_prune loop.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Lakhin 2021-10-29 13:30:00 Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Previous Message Noah Misch 2021-10-29 11:57:25 Re: conchuela timeouts since 2021-10-09 system upgrade