From: | Alvaro Herrera from 2ndQuadrant <alvherre(at)alvh(dot)no-ip(dot)org> |
---|---|
To: | Junwang Zhao <zhjwpku(at)gmail(dot)com> |
Cc: | Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18559: Crash after detaching a partition concurrently from another session |
Date: | 2024-08-12 22:33:09 |
Message-ID: | 202408122233.bo4adt3vh5bi@alvherre.pgsql |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 2024-Aug-12, Alvaro Herrera from 2ndQuadrant wrote:
> On 2024-Aug-10, Junwang Zhao wrote:
>
> > > IIUC, a simple fix would be to retry if an entry is not found. Attached a patch.
> >
> > I can reproduce the issue, and the patch LGTM.
>
> Interesting issue, thanks for reporting and putting together a
> reproducer. I have added some comments to the proposed patch, so here's
> a v2 for it. I'm going to write a commit message for it and push to all
> branches since 14.
Dept. of second thoughts. I couldn't find any reason why it's okay to
dereference the return value from systable_getnext() without verifying
that it's not NULL, so I added the check to the older branches too. As
far as I know we've never had a crash report that could be traced to
lack of that check, but that code still looks like it's assuming a
little too much.
One more thing here. Applying the test scripts that I used for the
previous bug with addition of DROP after the DETACH CONCURRENTLY, I get
a different failure during planning, which reports this error:
ERROR: could not open relation with OID 457639
STATEMENT: select * from p where a = $1;
and changing that error (in relation_open) from ERROR to PANIC would
result in this backtrace:
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo(at)entry=6, no_tid=no_tid(at)entry=0) at ./nptl/pthread_kill.c:44
#1 0x00007f149c3cae8f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 0x00007f149c37bfb2 in __GI_raise (sig=sig(at)entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007f149c366472 in __GI_abort () at ./stdlib/abort.c:79
#4 0x0000557db57b8611 in errfinish (filename=<optimized out>, lineno=61, funcname=0x557db5816128 <__func__.1> "relation_open")
at ../../../../../../pgsql/source/master/src/backend/utils/error/elog.c:599
#5 0x0000557db52441ad in relation_open (relationId=17466, lockmode=lockmode(at)entry=1)
at ../../../../../../pgsql/source/master/src/backend/access/common/relation.c:61
#6 0x0000557db537d1d9 in table_open (relationId=<optimized out>, lockmode=lockmode(at)entry=1)
at ../../../../../../pgsql/source/master/src/backend/access/table/table.c:44
#7 0x0000557db55b0cf8 in expand_partitioned_rtentry (root=root(at)entry=0x557db9c8b530, relinfo=relinfo(at)entry=0x557db9c8c2d8,
parentrte=parentrte(at)entry=0x557db9c8c178, parentRTindex=parentRTindex(at)entry=1, parentrel=parentrel(at)entry=0x7f149bc46060, parent_updatedCols=0x0,
top_parentrc=0x0, lockmode=1) at ../../../../../../pgsql/source/master/src/backend/optimizer/util/inherit.c:390
#8 0x0000557db55b121f in expand_inherited_rtentry (root=root(at)entry=0x557db9c8b530, rel=0x557db9c8c2d8, rte=0x557db9c8c178, rti=rti(at)entry=1)
at ../../../../../../pgsql/source/master/src/backend/optimizer/util/inherit.c:154
#9 0x0000557db558e11a in add_other_rels_to_query (root=root(at)entry=0x557db9c8b530)
at ../../../../../../pgsql/source/master/src/backend/optimizer/plan/initsplan.c:214
#10 0x0000557db5591983 in query_planner (root=root(at)entry=0x557db9c8b530, qp_callback=qp_callback(at)entry=0x557db5593470 <standard_qp_callback>,
qp_extra=qp_extra(at)entry=0x7ffd987c39e0) at ../../../../../../pgsql/source/master/src/backend/optimizer/plan/planmain.c:268
#11 0x0000557db5597968 in grouping_planner (root=root(at)entry=0x557db9c8b530, tuple_fraction=<optimized out>, tuple_fraction(at)entry=0, setops=setops(at)entry=0x0)
at ../../../../../../pgsql/source/master/src/backend/optimizer/plan/planner.c:1520
#12 0x0000557db559a8f3 in subquery_planner (glob=glob(at)entry=0x557db9c8c758, parse=parse(at)entry=0x557db9c8bf68, parent_root=parent_root(at)entry=0x0,
hasRecursion=hasRecursion(at)entry=false, tuple_fraction=tuple_fraction(at)entry=0, setops=setops(at)entry=0x0)
at ../../../../../../pgsql/source/master/src/backend/optimizer/plan/planner.c:1089
#13 0x0000557db559acea in standard_planner (parse=0x557db9c8bf68, query_string=<optimized out>, cursorOptions=2048, boundParams=0x0)
at ../../../../../../pgsql/source/master/src/backend/optimizer/plan/planner.c:415
#14 0x0000557db5677117 in pg_plan_query (querytree=querytree(at)entry=0x557db9c8bf68, query_string=query_string(at)entry=0x557db9cb34c8 "select * from p where a = $1;",
cursorOptions=cursorOptions(at)entry=2048, boundParams=boundParams(at)entry=0x0) at ../../../../../pgsql/source/master/src/backend/tcop/postgres.c:912
#15 0x0000557db5677273 in pg_plan_queries (querytrees=querytrees(at)entry=0x557db9d69268, query_string=0x557db9cb34c8 "select * from p where a = $1;",
cursorOptions=2048, boundParams=boundParams(at)entry=0x0) at ../../../../../pgsql/source/master/src/backend/tcop/postgres.c:1006
#16 0x0000557db57a39b3 in BuildCachedPlan (plansource=plansource(at)entry=0x557db9c5ff10, qlist=qlist(at)entry=0x557db9d69268, boundParams=boundParams(at)entry=0x0,
queryEnv=queryEnv(at)entry=0x0) at ../../../../../../pgsql/source/master/src/backend/utils/cache/plancache.c:962
#17 0x0000557db57a4204 in GetCachedPlan (plansource=plansource(at)entry=0x557db9c5ff10, boundParams=boundParams(at)entry=0x557db9cc0778, owner=owner(at)entry=0x0,
queryEnv=queryEnv(at)entry=0x0) at ../../../../../../pgsql/source/master/src/backend/utils/cache/plancache.c:1199
#18 0x0000557db56786c5 in exec_bind_message (input_message=0x7ffd987c3e70) at ../../../../../pgsql/source/master/src/backend/tcop/postgres.c:2017
#19 PostgresMain (dbname=<optimized out>, username=<optimized out>) at ../../../../../pgsql/source/master/src/backend/tcop/postgres.c:4814
Clearly this is undesirable, but I'm not sure how strongly we need to
pursue a fix for it. It seems hard to forbid dropping the table after
the detach. Is it enough to advise users to not drop partitions
immediately after detaching them? Is this a sign of a more fundamental
problem in the mechanism for DETACH CONCURRENTLY when used together with
prepared statements?
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Industry suffers from the managerial dogma that for the sake of stability
and continuity, the company should be independent of the competence of
individual employees." (E. Dijkstra)
From | Date | Subject | |
---|---|---|---|
Next Message | Jason Kim | 2024-08-13 04:17:51 | a row is not inserted in nested INSERT ON CONFLICT |
Previous Message | Cameron Vogt | 2024-08-12 21:19:22 | TLS session tickets disabled? |