| From: | Marcin Barczyński <mba(dot)ogolny(at)gmail(dot)com> | 
|---|---|
| To: | pgsql-bugs(at)lists(dot)postgresql(dot)org | 
| Subject: | REINDEX INDEX pg_catalog.pg_default_acl_role_nsp_obj_index stuck waiting for transaction from the future in PG 13.16 | 
| Date: | 2024-08-16 08:29:15 | 
| Message-ID: | CAP3o3PdTOTBngAq7cBo6DymqjAv+apMZv22tykf41r8HntUwBQ@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs | 
Hello!
REINDEX INDEX pg_catalog.pg_default_acl_role_nsp_obj_index got stuck
waiting for a transaction from the future in PG 13.16. Below is some
additional data that I was able to gather.
Index size is small:
postgres=# SELECT
pg_size_pretty(pg_table_size('pg_default_acl_role_nsp_obj_index'));
 pg_size_pretty
----------------
 16 kB
(1 row)
pg_locks showed that all locks had been granted:
postgres=# select * from pg_locks;
   locktype    | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction |  pid
 |        mode         | granted | fastpath
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+---------------------+---------+----------
 relation      |    14172 |    12141 |      |       |            |
          |         |       |          | 4/158              | 87021 |
AccessShareLock     | t       | t
 virtualxid    |          |          |      |       | 4/158      |
          |         |       |          | 4/158              | 87021 |
ExclusiveLock       | t       | t
 virtualxid    |          |          |      |       | 3/28       |
          |         |       |          | 3/28               | 86635 |
ExclusiveLock       | t       | t
 transactionid |          |          |      |       |            |
      792 |         |       |          | 3/28               | 86635 |
ExclusiveLock       | t       | f
 relation      |    16385 |      826 |      |       |            |
          |         |       |          | 3/28               | 86635 |
ShareLock           | t       | f
 relation      |    16385 |      827 |      |       |            |
          |         |       |          | 3/28               | 86635 |
AccessExclusiveLock | t       | f
(6 rows)
Stack trace:
(gdb) bt
#0  0x00007f8790f04199 in __GI___select (nfds=nfds(at)entry=0,
readfds=readfds(at)entry=0x0, writefds=writefds(at)entry=0x0,
exceptfds=exceptfds(at)entry=0x0, timeout=timeout(at)entry=0x7fffd823be20)
    at ../sysdeps/unix/sysv/linux/select.c:69
#1  0x000000000082d3d9 in pg_usleep (microsec=1000) at ../port/pgsleep.c:56
#2  pg_usleep (microsec=1000) at ../port/pgsleep.c:47
#3  XactLockTableWait (xid=2004, rel=<optimized out>, ctid=<optimized
out>, oper=XLTW_InsertIndexUnique) at storage/lmgr/lmgr.c:690
#4  0x000000000054b44b in heapam_index_build_range_scan
(heapRelation=0x7f86ca209a38, indexRelation=0x7f86ca20a5f8,
indexInfo=0x10cd008, allow_sync=<optimized out>, anyvisible=false,
progress=true,
    start_blockno=0, numblocks=4294967295, callback=0x568530
<_bt_build_callback>, callback_state=0x7fffd823c3b0, scan=0x10ca408)
at access/heap/heapam_handler.c:1552
#5  0x000000000056a3f8 in table_index_build_scan (scan=0x0,
callback_state=0x7fffd823c3b0, callback=0x568530 <_bt_build_callback>,
progress=true, allow_sync=true, index_info=0x10cd008,
    index_rel=0x7f86ca20a5f8, table_rel=0x7f86ca209a38) at
access/brin/../../../../src/include/access/tableam.h:1536
#6  _bt_spools_heapscan (indexInfo=0x10cd008,
buildstate=0x7fffd823c3b0, index=..., heap=0x7f86ca209a38) at
access/nbtree/nbtsort.c:478
#7  btbuild (heap=0x7f86ca209a38, index=0x7f86ca20a5f8,
indexInfo=0x10cd008) at access/nbtree/nbtsort.c:326
#8  0x00000000005ca5bc in index_build (heapRelation=0x7f86ca209a38,
indexRelation=0x7f86ca20a5f8, indexInfo=0x10cd008,
isreindex=<optimized out>, parallel=<optimized out>) at
catalog/index.c:3012
#9  0x00000000005cafbc in reindex_index (indexId=827,
skip_constraint_checks=<optimized out>, persistence=<optimized out>,
options=2) at catalog/index.c:3654
#10 0x00000000006666ed in ReindexIndex (indexRelation=<optimized out>,
options=0, concurrent=<optimized out>) at commands/indexcmds.c:2589
#11 0x0000000000846187 in standard_ProcessUtility (pstmt=<optimized
out>, queryString=0x1084758 "REINDEX INDEX
pg_catalog.pg_default_acl_role_nsp_obj_index",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
    queryEnv=0x0, dest=0x10855e0, qc=0x7fffd823ccf0) at tcop/utility.c:946
#12 0x00007f8790922c1d in pgss_ProcessUtility () from
/usr/pgsql-13/lib/pg_stat_statements.so
#13 0x0000000000846a39 in ProcessUtility (qc=0x7fffd823ccf0,
dest=0x10855e0, queryEnv=<optimized out>, params=<optimized out>,
context=PROCESS_UTILITY_TOPLEVEL, queryString=<optimized out>,
pstmt=0x1085320)
    at tcop/utility.c:520
#14 PortalRunUtility (portal=portal(at)entry=0x115b748,
pstmt=pstmt(at)entry=0x1085320, isTopLevel=isTopLevel(at)entry=true,
setHoldSnapshot=setHoldSnapshot(at)entry=false, dest=0x10855e0,
qc=0x7fffd823ccf0)
    at tcop/pquery.c:1153
#15 0x0000000000846bac in PortalRunMulti
(portal=portal(at)entry=0x115b748, isTopLevel=isTopLevel(at)entry=true,
setHoldSnapshot=setHoldSnapshot(at)entry=false,
dest=dest(at)entry=0x10855e0,
    altdest=altdest(at)entry=0x10855e0, qc=qc(at)entry=0x7fffd823ccf0) at
tcop/pquery.c:1316
#16 0x0000000000847060 in PortalRun (portal=0x115b748,
count=9223372036854775807, isTopLevel=<optimized out>,
run_once=<optimized out>, dest=0x10855e0, altdest=0x10855e0,
qc=0x7fffd823ccf0) at tcop/pquery.c:788
#17 0x0000000000840083 in exec_simple_query (query_string=0x1084758
"REINDEX INDEX  pg_catalog.pg_default_acl_role_nsp_obj_index") at
tcop/postgres.c:1241
#18 0x00000000008426fe in PostgresMain (argc=<optimized out>,
argv=<optimized out>, dbname=<optimized out>, username=<optimized
out>) at tcop/postgres.c:4426
#19 0x00000000007c6eaf in BackendRun (port=0x10cdb00) at
postmaster/postmaster.c:4560
#20 BackendStartup (port=0x10cdb00) at postmaster/postmaster.c:4244
#21 ServerLoop () at postmaster/postmaster.c:1742
#22 0x00000000007c7d27 in PostmasterMain (argc=3, argv=<optimized
out>) at postmaster/postmaster.c:1415
#23 0x00000000004fad81 in main (argc=3, argv=0x103f7b0) at main/main.c:210
The process waited for txid 2004 to complete.
But according to pg latest committed xid was:
(gdb) call ShmemVariableCache->latestCompletedXid
$5 = 791
And this was consistent with the fact that REINDEX was in transaction 792:
 16385 | redacted | 86635 |            |    16384 | redacted |
redacted    | 192.168.10.163 |                 |       39826 |
2024-08-14 08:23:39.689327-04 | 2024-08-14 08:23:39.712004-04 |
2024-08-1
4 08:23:39.712004-04 | 2024-08-14 08:23:39.712005-04 |
|                     | active |         792 |          792 | REINDEX
INDEX  pg_catalog.pg_default_acl_role_nsp_obj_index | client backend
Based on the following line in the source code:
xwait = HeapTupleHeaderGetUpdateXid(heapTuple->t_data);
I managed to figure out that the tuple was in a table with oid 826:
(gdb) print *heapTuple
$4 = {t_len = 82, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0},
ip_posid = 13}, t_tableOid = 826, t_data = 0x7f2c0ab12eb8}
which is:
postgres=# select relname from pg_class where oid = 826;
    relname
----------------
 pg_default_acl
(1 row)
And here is the content of `pg_default_acl`:
postgres=# SELECT * FROM heap_page_items(get_raw_page('pg_default_acl', 0));
 lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid
| t_infomask2 | t_infomask | t_hoff | t_bits | t_oid |
                            t_data
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+-------------------------------------------------------------------------------
-----------------
  1 |   8120 |        1 |     70 |    500 |   1941 |        0 | (0,3)
|       16389 |       1282 |     24 |        |       |
\xc94000000a0000009808000072430100000000000000090400000100000001000000c8400000
0a00000002000000
  2 |   8048 |        1 |     70 |    500 |   1941 |        1 | (0,4)
|       16389 |       1282 |     24 |        |       |
\xca4000000a0000000c40000072430100000000000000090400000100000001000000c8400000
0a00000002000000
  3 |   7976 |        1 |     70 |   1941 |   2019 |        0 | (0,5)
|       49157 |       8450 |     24 |        |       |
\xc94000000a0000009808000072430100000000000000090400000100000001000000c8400000
0a00000002000000
  4 |   7904 |        1 |     70 |   1941 |   2019 |        1 | (0,6)
|       49157 |       8450 |     24 |        |       |
\xca4000000a0000000c40000072430100000000000000090400000100000001000000c8400000
0a00000002000000
  5 |   7832 |        1 |     70 |   2019 |      0 |        0 | (0,5)
|       32773 |      10242 |     24 |        |       |
\xc94000000a0000009808000072430100000000000000090400000100000001000000c8400000
0a00000002000000
  6 |   7760 |        1 |     70 |   2019 |      0 |        1 | (0,6)
|       32773 |      10242 |     24 |        |       |
\xca4000000a0000000c40000072430100000000000000090400000100000001000000c8400000
0a00000002000000
(6 rows)
The instance is running in a virtual machine, so I can gather more
data if needed. If it makes a difference, the instance has logical
replication set up. Stopping the replication did not help.
postgres=# select version();
                                                  version
-----------------------------------------------------------------------------------------------------------
 PostgreSQL 13.16 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1
20231218 (Red Hat 11.4.1-3), 64-bit
(1 row)
-- 
Best regards,
Marcin Barczyński
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2024-08-16 14:44:25 | Re: TLS session tickets disabled? | 
| Previous Message | Daniel Gustafsson | 2024-08-16 08:11:32 | Re: TLS session tickets disabled? |