BUG #18386: Slow connection access after 'vacuum full pg_attribute'

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: 1165125080(at)qq(dot)com
Subject: BUG #18386: Slow connection access after 'vacuum full pg_attribute'
Date: 2024-03-11 12:48:39
Message-ID: 18386-59bf41fb0332a05d@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 18386
Logged by: ji xiaohang
Email address: 1165125080(at)qq(dot)com
PostgreSQL version: 14.8
Operating system: centos 7
Description:

In my database, there are a lot of tables and views, resulting in a large
pg_attribute table.
I then did a vacuum full pg_attribute and restarted the database.
Later, it was found that the first new connection took several minutes to
connect to the database.
The stack during connection access is as follows:

#0 0x00007faece149c64 in pread64 () from /usr/lib64/libpthread.so.0
#1 0x000055b919b8bc0c in FileRead (file=11, buffer=0x7fae37909580
"\035\002", amount=8192, offset=1019985920, wait_event_info=167772173) at
fd.c:1889
#2 0x000055b919bbdf39 in mdread (reln=0x55b91b6a9120, forknum=MAIN_FORKNUM,
blocknum=779870, buffer=0x7fae37909580 "\035\002") at md.c:638
#3 0x000055b919bbfefe in smgrread (reln=0x55b91b6a9120,
forknum=MAIN_FORKNUM, blocknum=779870, buffer=0x7fae37909580 "\035\002") at
smgr.c:590
#4 0x000055b919b83797 in ReadBuffer_common (smgr=0x55b91b6a9120,
relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=779870,
mode=RBM_NORMAL, strategy=0x55b91b69b490, hit=0x7ffda51d9e43) at
bufmgr.c:901
#5 0x000055b919b82f8c in ReadBufferExtended (reln=0x7fae35495758,
forkNum=MAIN_FORKNUM, blockNum=779870, mode=RBM_NORMAL,
strategy=0x55b91b69b490) at bufmgr.c:660
#6 0x000055b9197bd18e in heapgetpage (sscan=0x55b91b69a8d0, page=779870) at
heapam.c:383
#7 0x000055b9197bec8b in heapgettup_pagemode (scan=0x55b91b69a8d0,
dir=ForwardScanDirection, nkeys=2, key=0x55b91b69a450) at heapam.c:1079
#8 0x000055b9197bf1a1 in heap_getnextslot (sscan=0x55b91b69a8d0,
direction=ForwardScanDirection, slot=0x55b91b69b1f0) at heapam.c:1372
#9 0x000055b9197e2bfa in table_scan_getnextslot (sscan=0x55b91b69a8d0,
direction=ForwardScanDirection, slot=0x55b91b69b1f0) at
../../../../src/include/access/tableam.h:889
#10 0x000055b9197e34b8 in systable_getnext (sysscan=0x55b91b69a400) at
genam.c:471
#11 0x000055b919d3cb44 in RelationBuildTupleDesc (relation=0x7fae35499178)
at relcache.c:580
#12 0x000055b919d3dd09 in RelationBuildDesc (targetRelId=2662,
insertIt=true) at relcache.c:1207
#13 0x000055b919d42867 in load_critical_index (indexoid=2662, heapoid=1259)
at relcache.c:4146
#14 0x000055b919d423ee in RelationCacheInitializePhase3 () at
relcache.c:3900
#15 0x000055b919d68640 in InitPostgres (in_dbname=0x55b91b615c50 "conetdb",
dboid=0, username=0x55b91b615c38 "root", useroid=0, out_dbname=0x0,
override_allow_connections=false) at postinit.c:1035
#16 0x000055b919bc7761 in PostgresMain (argc=1, argv=0x55b91b615db0,
dbname=0x55b91b615c50 "conetdb", username=0x55b91b615c38 "root") at
postgres.c:3917
#17 0x000055b919b1f4db in BackendRun (port=0x55b91b61bab0) at
postmaster.c:4521
#18 0x000055b919b1ebdf in BackendStartup (port=0x55b91b61bab0) at
postmaster.c:4204
#19 0x000055b919b1ae50 in ServerLoop () at postmaster.c:1729
#20 0x000055b919b1a5ff in PostmasterMain (argc=1, argv=0x55b91b5e21d0) at
postmaster.c:1402
#21 0x000055b919a3be37 in main (argc=1, argv=0x55b91b5e21d0) at main.c:228

After analysis, it is confirmed that the relcache needs to be constructed
for new connections.
RelationBuildDesc needs to scan the pg_attribute table in sequence to load
the critical index.
However, the indexes of the system tables being loaded are moved to the end
of the file.
So the entire pg_attribute needs to be loaded, whereas before only a few
pages had to be scanned.

test=> select ctid, attrelid, attname,attnum from pg_attribute where
attrelid in ('pg_class_oid_index'::regclass,
'pg_attribute_relid_attnum_index'::regclass,
'pg_index_indexrelid_index'::regclass, 'pg_opclass_oid_index'::regclass,
'pg_amproc_fam_proc_index'::regclass,
'pg_rewrite_rel_rulename_index'::regclass,
'pg_trigger_tgrelid_tgname_index'::regclass);
ctid | attrelid | attname | attnum
--------------+----------+-----------------+--------
(2040908,37) | 2655 | amprocfamily | 1
(2040908,38) | 2655 | amproclefttype | 2
(2040908,39) | 2655 | amprocrighttype | 3
(2040908,40) | 2655 | amprocnum | 4
(2040908,47) | 2659 | attrelid | 1
(2040908,48) | 2659 | attnum | 2
(2040909,3) | 2662 | oid | 1
(2040909,46) | 2679 | indexrelid | 1
(2040910,8) | 2687 | oid | 1
(2040910,24) | 2693 | ev_class | 1
(2040910,25) | 2693 | rulename | 2
(2040910,47) | 2701 | tgrelid | 1
(2040910,48) | 2701 | tgname | 2
(13 rows)

So I wonder if it's necessary to optimize the vacuum full system table to
culster action by default.

Browse pgsql-bugs by date

  From Date Subject
Next Message ocean_li_996 2024-03-11 16:43:01 Re:RE: Re:RE: Re:BUG #18369: logical decoding core on AssertTXNLsnOrder()
Previous Message Alexey Ermakov 2024-03-11 08:12:57 Re: BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker