| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
| Cc: | Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, james(dot)inform(at)pharmapp(dot)de, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: BUG #16811: Severe reproducible server backend crash |
| Date: | 2021-01-07 17:22:44 |
| Message-ID: | 208316.1610040164@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> Thanks for the report. I happened to have DBeaver here and could
> reproduce this, and got the following core:
I can reproduce it without anything extra. What's needed is to run
the problematic statement in extended query mode, which you can
do like this:
$ cat foo.sql
do $$ begin rollback; end $$;
$ pgbench -n -f foo.sql -M prepared
pgbench: error: client 0 aborted in command 0 (SQL) of script 0; perhaps the backend died while processing
That lnext() should certainly not find pstmt->stmts to be NIL,
seeing that we are inside a loop over that list. Ergo, something
is clobbering this active portal. A bit of gdb'ing says the clobber
happens here:
#0 AtAbort_Portals () at portalmem.c:833
(this appears to be inlined code from PortalReleaseCachedPlan)
#1 0x00000000005a4ce2 in AbortTransaction () at xact.c:2711
#2 0x00000000005a55d5 in AbortCurrentTransaction () at xact.c:3322
#3 0x00000000006d1557 in _SPI_rollback (chain=<optimized out>) at spi.c:326
#4 0x00007feef9e851c5 in exec_stmt_rollback (stmt=0x2babca8,
estate=0x7fff35e55ee0) at pl_exec.c:4961
#5 exec_stmts (estate=0x7fff35e55ee0, stmts=0x2babd80) at pl_exec.c:2081
#6 0x00007feef9e863cb in exec_stmt_block (estate=0x7fff35e55ee0,
block=0x2babdd8) at pl_exec.c:1904
#7 0x00007feef9e864bb in exec_toplevel_block (
estate=estate(at)entry=0x7fff35e55ee0, block=0x2babdd8) at pl_exec.c:1602
#8 0x00007feef9e86ced in plpgsql_exec_function (func=func(at)entry=0x2ba7c60,
fcinfo=fcinfo(at)entry=0x7fff35e56060,
simple_eval_estate=simple_eval_estate(at)entry=0x2bad6b0,
simple_eval_resowner=simple_eval_resowner(at)entry=0x2b12e40,
atomic=<optimized out>) at pl_exec.c:605
#9 0x00007feef9e8fd58 in plpgsql_inline_handler (fcinfo=<optimized out>)
at pl_handler.c:344
#10 0x000000000091a540 in FunctionCall1Coll (flinfo=0x7fff35e561f0,
collation=<optimized out>, arg1=<optimized out>) at fmgr.c:1141
#11 0x000000000091aaa9 in OidFunctionCall1Coll (functionId=<optimized out>,
collation=collation(at)entry=0, arg1=45120272) at fmgr.c:1419
#12 0x000000000064df7e in ExecuteDoStmt (stmt=stmt(at)entry=0x2b07ed8,
atomic=atomic(at)entry=false) at functioncmds.c:2027
#13 0x000000000080fa14 in standard_ProcessUtility (pstmt=0x2b07e40,
queryString=0x2b079a0 "do $$ begin rollback; end $$;",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0xa90540 <donothingDR>, qc=0x7fff35e56630) at utility.c:696
#14 0x000000000080d044 in PortalRunUtility (portal=0x2b47240, pstmt=0x2b07e40,
isTopLevel=<optimized out>, setHoldSnapshot=<optimized out>,
dest=0xa90540 <donothingDR>, qc=0x7fff35e56630) at pquery.c:1159
#15 0x000000000080db24 in PortalRunMulti (portal=portal(at)entry=0x2b47240,
isTopLevel=isTopLevel(at)entry=true,
setHoldSnapshot=setHoldSnapshot(at)entry=false, dest=0xa90540 <donothingDR>,
dest(at)entry=0x2adfa88, altdest=0xa90540 <donothingDR>,
altdest(at)entry=0x2adfa88, qc=qc(at)entry=0x7fff35e56630) at pquery.c:1311
#16 0x000000000080e937 in PortalRun (portal=portal(at)entry=0x2b47240,
count=count(at)entry=9223372036854775807, isTopLevel=isTopLevel(at)entry=true,
run_once=run_once(at)entry=true, dest=dest(at)entry=0x2adfa88,
altdest=altdest(at)entry=0x2adfa88, qc=0x7fff35e56630) at pquery.c:779
#17 0x000000000080c77b in exec_execute_message (max_rows=9223372036854775807,
portal_name=0x2adf670 "") at postgres.c:2196
#18 PostgresMain (argc=argc(at)entry=1, argv=argv(at)entry=0x7fff35e569c0,
dbname=<optimized out>, username=<optimized out>) at postgres.c:4452
So I would say that the conditions under which AtAbort_Portals
decides that it can destroy a portal rather than just mark it failed
need to be reconsidered. It's not clear to me exactly how that
should change though. Maybe Peter has more insight.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | PG Bug reporting form | 2021-01-07 19:00:01 | BUG #16814: Invalid memory access on regexp_match with .* and BRE |
| Previous Message | PG Bug reporting form | 2021-01-07 14:02:06 | BUG #16813: error to solve the problem "Windows could not stat file - over 4GB" |