Re: BUG #16811: Severe reproducible server backend crash

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, james(dot)inform(at)pharmapp(dot)de, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16811: Severe reproducible server backend crash
Date: 2021-01-07 17:22:44
Message-ID: 208316.1610040164@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> Thanks for the report. I happened to have DBeaver here and could
> reproduce this, and got the following core:

I can reproduce it without anything extra. What's needed is to run
the problematic statement in extended query mode, which you can
do like this:

$ cat foo.sql
do $$ begin rollback; end $$;

$ pgbench -n -f foo.sql -M prepared
pgbench: error: client 0 aborted in command 0 (SQL) of script 0; perhaps the backend died while processing

That lnext() should certainly not find pstmt->stmts to be NIL,
seeing that we are inside a loop over that list. Ergo, something
is clobbering this active portal. A bit of gdb'ing says the clobber
happens here:

#0 AtAbort_Portals () at portalmem.c:833
(this appears to be inlined code from PortalReleaseCachedPlan)
#1 0x00000000005a4ce2 in AbortTransaction () at xact.c:2711
#2 0x00000000005a55d5 in AbortCurrentTransaction () at xact.c:3322
#3 0x00000000006d1557 in _SPI_rollback (chain=<optimized out>) at spi.c:326
#4 0x00007feef9e851c5 in exec_stmt_rollback (stmt=0x2babca8,
estate=0x7fff35e55ee0) at pl_exec.c:4961
#5 exec_stmts (estate=0x7fff35e55ee0, stmts=0x2babd80) at pl_exec.c:2081
#6 0x00007feef9e863cb in exec_stmt_block (estate=0x7fff35e55ee0,
block=0x2babdd8) at pl_exec.c:1904
#7 0x00007feef9e864bb in exec_toplevel_block (
estate=estate(at)entry=0x7fff35e55ee0, block=0x2babdd8) at pl_exec.c:1602
#8 0x00007feef9e86ced in plpgsql_exec_function (func=func(at)entry=0x2ba7c60,
fcinfo=fcinfo(at)entry=0x7fff35e56060,
simple_eval_estate=simple_eval_estate(at)entry=0x2bad6b0,
simple_eval_resowner=simple_eval_resowner(at)entry=0x2b12e40,
atomic=<optimized out>) at pl_exec.c:605
#9 0x00007feef9e8fd58 in plpgsql_inline_handler (fcinfo=<optimized out>)
at pl_handler.c:344
#10 0x000000000091a540 in FunctionCall1Coll (flinfo=0x7fff35e561f0,
collation=<optimized out>, arg1=<optimized out>) at fmgr.c:1141
#11 0x000000000091aaa9 in OidFunctionCall1Coll (functionId=<optimized out>,
collation=collation(at)entry=0, arg1=45120272) at fmgr.c:1419
#12 0x000000000064df7e in ExecuteDoStmt (stmt=stmt(at)entry=0x2b07ed8,
atomic=atomic(at)entry=false) at functioncmds.c:2027
#13 0x000000000080fa14 in standard_ProcessUtility (pstmt=0x2b07e40,
queryString=0x2b079a0 "do $$ begin rollback; end $$;",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0xa90540 <donothingDR>, qc=0x7fff35e56630) at utility.c:696
#14 0x000000000080d044 in PortalRunUtility (portal=0x2b47240, pstmt=0x2b07e40,
isTopLevel=<optimized out>, setHoldSnapshot=<optimized out>,
dest=0xa90540 <donothingDR>, qc=0x7fff35e56630) at pquery.c:1159
#15 0x000000000080db24 in PortalRunMulti (portal=portal(at)entry=0x2b47240,
isTopLevel=isTopLevel(at)entry=true,
setHoldSnapshot=setHoldSnapshot(at)entry=false, dest=0xa90540 <donothingDR>,
dest(at)entry=0x2adfa88, altdest=0xa90540 <donothingDR>,
altdest(at)entry=0x2adfa88, qc=qc(at)entry=0x7fff35e56630) at pquery.c:1311
#16 0x000000000080e937 in PortalRun (portal=portal(at)entry=0x2b47240,
count=count(at)entry=9223372036854775807, isTopLevel=isTopLevel(at)entry=true,
run_once=run_once(at)entry=true, dest=dest(at)entry=0x2adfa88,
altdest=altdest(at)entry=0x2adfa88, qc=0x7fff35e56630) at pquery.c:779
#17 0x000000000080c77b in exec_execute_message (max_rows=9223372036854775807,
portal_name=0x2adf670 "") at postgres.c:2196
#18 PostgresMain (argc=argc(at)entry=1, argv=argv(at)entry=0x7fff35e569c0,
dbname=<optimized out>, username=<optimized out>) at postgres.c:4452

So I would say that the conditions under which AtAbort_Portals
decides that it can destroy a portal rather than just mark it failed
need to be reconsidered. It's not clear to me exactly how that
should change though. Maybe Peter has more insight.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2021-01-07 19:00:01 BUG #16814: Invalid memory access on regexp_match with .* and BRE
Previous Message PG Bug reporting form 2021-01-07 14:02:06 BUG #16813: error to solve the problem "Windows could not stat file - over 4GB"