BUG #18559: Crash after detaching a partition concurrently from another session

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: kuntalghosh(dot)2007(at)gmail(dot)com
Subject: BUG #18559: Crash after detaching a partition concurrently from another session
Date: 2024-07-30 13:47:15
Message-ID: 18559-b48286d2eacd9a4e@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 18559
Logged by: Kuntal Ghosh
Email address: kuntalghosh(dot)2007(at)gmail(dot)com
PostgreSQL version: 17beta2
Operating system: AL2
Description:

I've encountered the following crash while dropping a partition table
followed by detaching it concurrently.

#0 0x0000000000900e5f in heap_getattr (tup=0x0, attnum=33,
tupleDesc=0x7f40db0a5458, isnull=0x7ffcb110197e) at
../../../src/include/access/htup_details.h:801
801 if (attnum > (int)
HeapTupleHeaderGetNatts(tup->t_data))
(gdb) bt
#0 0x0000000000900e5f in heap_getattr (tup=0x0, attnum=33,
tupleDesc=0x7f40db0a5458, isnull=0x7ffcb110197e) at
../../../src/include/access/htup_details.h:801
#1 0x000000000090123b in RelationBuildPartitionDesc (rel=0x7f40db0b68e8,
omit_detached=true) at partdesc.c:237
#2 0x0000000000900fe0 in RelationGetPartitionDesc (rel=0x7f40db0b68e8,
omit_detached=true) at partdesc.c:109
#3 0x0000000000901889 in PartitionDirectoryLookup (pdir=0x24287e8,
rel=0x7f40db0b68e8) at partdesc.c:457
#4 0x00000000008e77c3 in set_relation_partition_info (root=0x241c308,
rel=0x241d518, relation=0x7f40db0b68e8) at plancat.c:2367
#5 0x00000000008e48c6 in get_relation_info (root=0x241c308,
relationObjectId=16388, inhparent=true, rel=0x241d518) at plancat.c:554
#6 0x00000000008eb8b7 in build_simple_rel (root=0x241c308, relid=1,
parent=0x0) at relnode.c:340
#7 0x000000000089f007 in add_base_rels_to_query (root=0x241c308,
jtnode=0x241be90) at initsplan.c:165
#8 0x000000000089f04e in add_base_rels_to_query (root=0x241c308,
jtnode=0x241c238) at initsplan.c:173
#9 0x00000000008a5363 in query_planner (root=0x241c308,
qp_callback=0x8aba74 <standard_qp_callback>, qp_extra=0x7ffcb1101da0) at
planmain.c:170
#10 0x00000000008a7d88 in grouping_planner (root=0x241c308,
tuple_fraction=0, setops=0x0) at planner.c:1520
#11 0x00000000008a74b0 in subquery_planner (glob=0x241b988, parse=0x241d1f8,
parent_root=0x0, hasRecursion=false, tuple_fraction=0, setops=0x0) at
planner.c:1089
#12 0x00000000008a5ae7 in standard_planner (parse=0x241d1f8,
query_string=0x23732d8 "prepare p1 as select * from p;", cursorOptions=2048,
boundParams=0x0) at planner.c:415
#13 0x00000000008a587e in planner (parse=0x241d1f8, query_string=0x23732d8
"prepare p1 as select * from p;", cursorOptions=2048, boundParams=0x0) at
planner.c:282
#14 0x00000000009e7dbc in pg_plan_query (querytree=0x241d1f8,
query_string=0x23732d8 "prepare p1 as select * from p;", cursorOptions=2048,
boundParams=0x0) at postgres.c:904
#15 0x00000000009e7eed in pg_plan_queries (querytrees=0x241c2b8,
query_string=0x23732d8 "prepare p1 as select * from p;", cursorOptions=2048,
boundParams=0x0) at postgres.c:996
#16 0x0000000000b9e50f in BuildCachedPlan (plansource=0x2374270,
qlist=0x241c2b8, boundParams=0x0, queryEnv=0x0) at plancache.c:962
#17 0x0000000000b9eaeb in GetCachedPlan (plansource=0x2374270,
boundParams=0x0, owner=0x0, queryEnv=0x0) at plancache.c:1199
#18 0x00000000006cfd2c in ExecuteQuery (pstate=0x2372ed8, stmt=0x2349130,
intoClause=0x0, params=0x0, dest=0x2372e48, qc=0x7ffcb1102630) at
prepare.c:193
#19 0x00000000009f0c2c in standard_ProcessUtility (pstmt=0x23491e0,
queryString=0x2348720 "execute p1;", readOnlyTree=false,
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x2372e48,
qc=0x7ffcb1102630)
at utility.c:750
#20 0x00000000009f061b in ProcessUtility (pstmt=0x23491e0,
queryString=0x2348720 "execute p1;", readOnlyTree=false,
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x2372e48,
qc=0x7ffcb1102630)
at utility.c:523
#21 0x00000000009ef237 in PortalRunUtility (portal=0x23c8100,
pstmt=0x23491e0, isTopLevel=true, setHoldSnapshot=true, dest=0x2372e48,
qc=0x7ffcb1102630) at pquery.c:1158
#22 0x00000000009eefa0 in FillPortalStore (portal=0x23c8100,
isTopLevel=true) at pquery.c:1031
#23 0x00000000009ee90e in PortalRun (portal=0x23c8100,
count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x23495a0,
altdest=0x23495a0, qc=0x7ffcb1102800) at pquery.c:763
#24 0x00000000009e83e6 in exec_simple_query (query_string=0x2348720 "execute
p1;") at postgres.c:1274
#25 0x00000000009ecb0b in PostgresMain (dbname=0x2381fa8 "postgres",
username=0x2381f88 "kuntalgh") at postgres.c:4696
#26 0x00000000009e4c0a in BackendMain (startup_data=0x7ffcb1102b0c "",
startup_data_len=4) at backend_startup.c:107
#27 0x0000000000910ea3 in postmaster_child_launch (child_type=B_BACKEND,
startup_data=0x7ffcb1102b0c "", startup_data_len=4,
client_sock=0x7ffcb1102b30) at launch_backend.c:274
#28 0x0000000000916661 in BackendStartup (client_sock=0x7ffcb1102b30) at
postmaster.c:3495
#29 0x0000000000913d7c in ServerLoop () at postmaster.c:1662
#30 0x0000000000913736 in PostmasterMain (argc=3, argv=0x2342ea0) at
postmaster.c:1360
#31 0x00000000007d2e9f in main (argc=3, argv=0x2342ea0) at main.c:197

I've reproduced the issue by following [1] with minor modification.

1. ./configure --enable-debug --enable-depend --enable-cassert CFLAGS=-O0
2. make -j; make install -j; initdb -D ./primary; pg_ctl -D ../primary -l
logfile start
3. alter system set plan_cache_mode to 'force_generic_plan' ; select
pg_reload_conf();
4. create table p( a int,b int) partition by range(a);create table p1
partition of p for values from (0) to (1);create table p2 partition of p
for
values from (1) to (2);

Now, we need to use GDB to reproduce the crash.

Session 1:
1. Attach GDB and put a breakpoint at ATExecDetachPartition

Session 2:
1. SQL:prepare p1 as select * from p;
2. Attach GDB and put a breakpoint at ProcessUtility() and
find_inheritance_children_extended()

Session 1:
1. alter table p detach partition p2 concurrently;
2. The session will be stalled at ATExecDetachPartition. Continue stepping
next till CommitTransactionCommand();

Session 2:
1. SQL:execute p1;
2. The session will be stalled at ProcessUtility(). Before that, it takes
the snapshot.

Session 1:
1. Continue till DetachPartitionFinalize.

Session 2:
1. Continue till find_inheritance_children_extended(). It'll find two
partitions as transaction 1 isn't yet committed. Complete the execution in
that function.

Session 1:
1. Run to completion.
2. SQL: drop table p2;

Session 1:
1. It will crash as it assumes an entry in pg_class for the dropped
relation.

The following code assumes that an pg_class entry for the detached partition
will always be available which is wrong.

Thanks,
Kuntal
[1]
https://www.postgresql.org/message-id/CAHewXNkaKgVmT%2BOkVA9UHrEYm%2Bb8J6o_8%2B-84Qey6V5tM-%2Bz9A%40mail.gmail.com

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Kuntal Ghosh 2024-07-30 13:52:33 Re: BUG #18559: Crash after detaching a partition concurrently from another session
Previous Message Zaid Shabbir 2024-07-30 13:41:36 Re: Intermittent aggressive use of SWAP space by PostgreSQL despite availability of HUGE amounts of RAM for a small database.