REINDEX INDEX pg_catalog.pg_default_acl_role_nsp_obj_index stuck waiting for transaction from the future in PG 13.16

From: Marcin Barczyński <mba(dot)ogolny(at)gmail(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: REINDEX INDEX pg_catalog.pg_default_acl_role_nsp_obj_index stuck waiting for transaction from the future in PG 13.16
Date: 2024-08-16 08:29:15
Message-ID: CAP3o3PdTOTBngAq7cBo6DymqjAv+apMZv22tykf41r8HntUwBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello!

REINDEX INDEX pg_catalog.pg_default_acl_role_nsp_obj_index got stuck
waiting for a transaction from the future in PG 13.16. Below is some
additional data that I was able to gather.

Index size is small:

postgres=# SELECT
pg_size_pretty(pg_table_size('pg_default_acl_role_nsp_obj_index'));
pg_size_pretty
----------------
16 kB
(1 row)

pg_locks showed that all locks had been granted:

postgres=# select * from pg_locks;
locktype | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction | pid
| mode | granted | fastpath
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+---------------------+---------+----------
relation | 14172 | 12141 | | | |
| | | | 4/158 | 87021 |
AccessShareLock | t | t
virtualxid | | | | | 4/158 |
| | | | 4/158 | 87021 |
ExclusiveLock | t | t
virtualxid | | | | | 3/28 |
| | | | 3/28 | 86635 |
ExclusiveLock | t | t
transactionid | | | | | |
792 | | | | 3/28 | 86635 |
ExclusiveLock | t | f
relation | 16385 | 826 | | | |
| | | | 3/28 | 86635 |
ShareLock | t | f
relation | 16385 | 827 | | | |
| | | | 3/28 | 86635 |
AccessExclusiveLock | t | f
(6 rows)

Stack trace:

(gdb) bt
#0 0x00007f8790f04199 in __GI___select (nfds=nfds(at)entry=0,
readfds=readfds(at)entry=0x0, writefds=writefds(at)entry=0x0,
exceptfds=exceptfds(at)entry=0x0, timeout=timeout(at)entry=0x7fffd823be20)
at ../sysdeps/unix/sysv/linux/select.c:69
#1 0x000000000082d3d9 in pg_usleep (microsec=1000) at ../port/pgsleep.c:56
#2 pg_usleep (microsec=1000) at ../port/pgsleep.c:47
#3 XactLockTableWait (xid=2004, rel=<optimized out>, ctid=<optimized
out>, oper=XLTW_InsertIndexUnique) at storage/lmgr/lmgr.c:690
#4 0x000000000054b44b in heapam_index_build_range_scan
(heapRelation=0x7f86ca209a38, indexRelation=0x7f86ca20a5f8,
indexInfo=0x10cd008, allow_sync=<optimized out>, anyvisible=false,
progress=true,
start_blockno=0, numblocks=4294967295, callback=0x568530
<_bt_build_callback>, callback_state=0x7fffd823c3b0, scan=0x10ca408)
at access/heap/heapam_handler.c:1552
#5 0x000000000056a3f8 in table_index_build_scan (scan=0x0,
callback_state=0x7fffd823c3b0, callback=0x568530 <_bt_build_callback>,
progress=true, allow_sync=true, index_info=0x10cd008,
index_rel=0x7f86ca20a5f8, table_rel=0x7f86ca209a38) at
access/brin/../../../../src/include/access/tableam.h:1536
#6 _bt_spools_heapscan (indexInfo=0x10cd008,
buildstate=0x7fffd823c3b0, index=..., heap=0x7f86ca209a38) at
access/nbtree/nbtsort.c:478
#7 btbuild (heap=0x7f86ca209a38, index=0x7f86ca20a5f8,
indexInfo=0x10cd008) at access/nbtree/nbtsort.c:326
#8 0x00000000005ca5bc in index_build (heapRelation=0x7f86ca209a38,
indexRelation=0x7f86ca20a5f8, indexInfo=0x10cd008,
isreindex=<optimized out>, parallel=<optimized out>) at
catalog/index.c:3012
#9 0x00000000005cafbc in reindex_index (indexId=827,
skip_constraint_checks=<optimized out>, persistence=<optimized out>,
options=2) at catalog/index.c:3654
#10 0x00000000006666ed in ReindexIndex (indexRelation=<optimized out>,
options=0, concurrent=<optimized out>) at commands/indexcmds.c:2589
#11 0x0000000000846187 in standard_ProcessUtility (pstmt=<optimized
out>, queryString=0x1084758 "REINDEX INDEX
pg_catalog.pg_default_acl_role_nsp_obj_index",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
queryEnv=0x0, dest=0x10855e0, qc=0x7fffd823ccf0) at tcop/utility.c:946
#12 0x00007f8790922c1d in pgss_ProcessUtility () from
/usr/pgsql-13/lib/pg_stat_statements.so
#13 0x0000000000846a39 in ProcessUtility (qc=0x7fffd823ccf0,
dest=0x10855e0, queryEnv=<optimized out>, params=<optimized out>,
context=PROCESS_UTILITY_TOPLEVEL, queryString=<optimized out>,
pstmt=0x1085320)
at tcop/utility.c:520
#14 PortalRunUtility (portal=portal(at)entry=0x115b748,
pstmt=pstmt(at)entry=0x1085320, isTopLevel=isTopLevel(at)entry=true,
setHoldSnapshot=setHoldSnapshot(at)entry=false, dest=0x10855e0,
qc=0x7fffd823ccf0)
at tcop/pquery.c:1153
#15 0x0000000000846bac in PortalRunMulti
(portal=portal(at)entry=0x115b748, isTopLevel=isTopLevel(at)entry=true,
setHoldSnapshot=setHoldSnapshot(at)entry=false,
dest=dest(at)entry=0x10855e0,
altdest=altdest(at)entry=0x10855e0, qc=qc(at)entry=0x7fffd823ccf0) at
tcop/pquery.c:1316
#16 0x0000000000847060 in PortalRun (portal=0x115b748,
count=9223372036854775807, isTopLevel=<optimized out>,
run_once=<optimized out>, dest=0x10855e0, altdest=0x10855e0,
qc=0x7fffd823ccf0) at tcop/pquery.c:788
#17 0x0000000000840083 in exec_simple_query (query_string=0x1084758
"REINDEX INDEX pg_catalog.pg_default_acl_role_nsp_obj_index") at
tcop/postgres.c:1241
#18 0x00000000008426fe in PostgresMain (argc=<optimized out>,
argv=<optimized out>, dbname=<optimized out>, username=<optimized
out>) at tcop/postgres.c:4426
#19 0x00000000007c6eaf in BackendRun (port=0x10cdb00) at
postmaster/postmaster.c:4560
#20 BackendStartup (port=0x10cdb00) at postmaster/postmaster.c:4244
#21 ServerLoop () at postmaster/postmaster.c:1742
#22 0x00000000007c7d27 in PostmasterMain (argc=3, argv=<optimized
out>) at postmaster/postmaster.c:1415
#23 0x00000000004fad81 in main (argc=3, argv=0x103f7b0) at main/main.c:210

The process waited for txid 2004 to complete.
But according to pg latest committed xid was:

(gdb) call ShmemVariableCache->latestCompletedXid
$5 = 791

And this was consistent with the fact that REINDEX was in transaction 792:

16385 | redacted | 86635 | | 16384 | redacted |
redacted | 192.168.10.163 | | 39826 |
2024-08-14 08:23:39.689327-04 | 2024-08-14 08:23:39.712004-04 |
2024-08-1
4 08:23:39.712004-04 | 2024-08-14 08:23:39.712005-04 |
| | active | 792 | 792 | REINDEX
INDEX pg_catalog.pg_default_acl_role_nsp_obj_index | client backend

Based on the following line in the source code:

xwait = HeapTupleHeaderGetUpdateXid(heapTuple->t_data);

I managed to figure out that the tuple was in a table with oid 826:

(gdb) print *heapTuple
$4 = {t_len = 82, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0},
ip_posid = 13}, t_tableOid = 826, t_data = 0x7f2c0ab12eb8}

which is:

postgres=# select relname from pg_class where oid = 826;
relname
----------------
pg_default_acl
(1 row)

And here is the content of `pg_default_acl`:

postgres=# SELECT * FROM heap_page_items(get_raw_page('pg_default_acl', 0));
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid
| t_infomask2 | t_infomask | t_hoff | t_bits | t_oid |
t_data

----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+-------------------------------------------------------------------------------
-----------------
1 | 8120 | 1 | 70 | 500 | 1941 | 0 | (0,3)
| 16389 | 1282 | 24 | | |
\xc94000000a0000009808000072430100000000000000090400000100000001000000c8400000
0a00000002000000
2 | 8048 | 1 | 70 | 500 | 1941 | 1 | (0,4)
| 16389 | 1282 | 24 | | |
\xca4000000a0000000c40000072430100000000000000090400000100000001000000c8400000
0a00000002000000
3 | 7976 | 1 | 70 | 1941 | 2019 | 0 | (0,5)
| 49157 | 8450 | 24 | | |
\xc94000000a0000009808000072430100000000000000090400000100000001000000c8400000
0a00000002000000
4 | 7904 | 1 | 70 | 1941 | 2019 | 1 | (0,6)
| 49157 | 8450 | 24 | | |
\xca4000000a0000000c40000072430100000000000000090400000100000001000000c8400000
0a00000002000000
5 | 7832 | 1 | 70 | 2019 | 0 | 0 | (0,5)
| 32773 | 10242 | 24 | | |
\xc94000000a0000009808000072430100000000000000090400000100000001000000c8400000
0a00000002000000
6 | 7760 | 1 | 70 | 2019 | 0 | 1 | (0,6)
| 32773 | 10242 | 24 | | |
\xca4000000a0000000c40000072430100000000000000090400000100000001000000c8400000
0a00000002000000
(6 rows)

The instance is running in a virtual machine, so I can gather more
data if needed. If it makes a difference, the instance has logical
replication set up. Stopping the replication did not help.

postgres=# select version();
version
-----------------------------------------------------------------------------------------------------------
PostgreSQL 13.16 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1
20231218 (Red Hat 11.4.1-3), 64-bit
(1 row)

--
Best regards,
Marcin Barczyński

Browse pgsql-bugs by date

  From Date Subject
Previous Message Daniel Gustafsson 2024-08-16 08:11:32 Re: TLS session tickets disabled?