Re: BUG #18334: Segfault when running a query with parallel workers

From: Marcin Barczyński <mba(dot)ogolny(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Marcin Barczyński <mba(dot)ogolny(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18334: Segfault when running a query with parallel workers
Date: 2024-05-23 11:59:46
Message-ID: CAP3o3Pcv+Mo0Vmo_A8Ev7mOU1qwdSFvxSBMhL3-axESsruhTNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Thomas,

On Sun, Feb 11, 2024 at 10:31 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> Could you please show EXPLAIN ANALYZE for the query? In gdb from that
> core, can you please show "info proc mappings", and in frame 0 "print
> *area", and in frame 1, "print *tuple" and "print *hashtable"?

I'm sorry for my late reply.
It happened again, and I'm pasting info you requested from core.
PostgreSQL 13.15.

Stack trace:

#0 0x000056134d5bb011 in dsa_free (area=0x56134e07d718, dp=<optimized
out>) at utils/mmgr/./build/../src/backend/utils/mmgr/dsa.c:840
840 utils/mmgr/./build/../src/backend/utils/mmgr/dsa.c: No such file
or directory.
(gdb) bt
#0 0x000056134d5bb011 in dsa_free (area=0x56134e07d718, dp=<optimized
out>) at utils/mmgr/./build/../src/backend/utils/mmgr/dsa.c:840
#1 0x000056134d2d6a0c in ExecHashTableDetachBatch
(hashtable=hashtable(at)entry=0x56134e154540) at
executor/./build/../src/backend/executor/nodeHash.c:3181
#2 0x000056134d2d821a in ExecParallelHashJoinNewBatch
(hjstate=0x56134e087b48) at
executor/./build/../src/backend/executor/nodeHashjoin.c:1131
#3 ExecHashJoinImpl (parallel=<optimized out>, pstate=<optimized
out>) at executor/./build/../src/backend/executor/nodeHashjoin.c:590
#4 ExecParallelHashJoin (pstate=<optimized out>) at
executor/./build/../src/backend/executor/nodeHashjoin.c:637
#5 0x000056134d2bbffd in ExecProcNodeInstr (node=0x56134e087b48) at
executor/./build/../src/backend/executor/execProcnode.c:467
#6 0x000056134d2b1bbd in ExecProcNode (node=0x56134e087b48) at
executor/./build/../src/include/executor/executor.h:248
#7 ExecutePlan (execute_once=<optimized out>, dest=0x56134dfe1fe8,
direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>,
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x56134e087b48, estate=0x56134e087858) at
executor/./build/../src/backend/executor/execMain.c:1632
#8 standard_ExecutorRun (queryDesc=0x56134e0783e0,
direction=<optimized out>, count=0, execute_once=<optimized out>) at
executor/./build/../src/backend/executor/execMain.c:350
#9 0x00007f3a734c9f25 in pgss_ExecutorRun (queryDesc=0x56134e0783e0,
direction=ForwardScanDirection, count=0, execute_once=<optimized out>)
at ./build/../contrib/pg_stat_statements/pg_stat_statements.c:1045
#10 0x00007f3a771296d2 in explain_ExecutorRun
(queryDesc=0x56134e0783e0, direction=ForwardScanDirection, count=0,
execute_once=<optimized out>)
at ./build/../contrib/auto_explain/auto_explain.c:334
#11 0x000056134d2b8729 in ExecutorRun (execute_once=true,
count=<optimized out>, direction=ForwardScanDirection,
queryDesc=0x56134e0783e0)
at executor/./build/../src/backend/executor/execMain.c:292
#12 ParallelQueryMain (seg=seg(at)entry=0x56134df98db8,
toc=toc(at)entry=0x7f321dfa4000) at
executor/./build/../src/backend/executor/execParallel.c:1448
#13 0x000056134d1767ce in ParallelWorkerMain (main_arg=<optimized
out>) at access/transam/./build/../src/backend/access/transam/parallel.c:1494
#14 0x000056134d3b981a in StartBackgroundWorker () at
postmaster/./build/../src/backend/postmaster/bgworker.c:890
#15 0x000056134d3c963e in do_start_bgworker (rw=<optimized out>) at
postmaster/./build/../src/backend/postmaster/postmaster.c:5896
#16 maybe_start_bgworkers () at
postmaster/./build/../src/backend/postmaster/postmaster.c:6121
#17 0x000056134d3c988d in sigusr1_handler
(postgres_signal_arg=<optimized out>) at
postmaster/./build/../src/backend/postmaster/postmaster.c:5281
#18 <signal handler called>
#19 0x00007f3a761ac59d in __GI___select (nfds=nfds(at)entry=8,
readfds=readfds(at)entry=0x7fff97c44720, writefds=writefds(at)entry=0x0,
exceptfds=exceptfds(at)entry=0x0, timeout=timeout(at)entry=0x7fff97c44680)
at ../sysdeps/unix/sysv/linux/select.c:69
#20 0x000056134d3caa16 in ServerLoop () at
postmaster/./build/../src/backend/postmaster/postmaster.c:1706
#21 0x000056134d3cc725 in PostmasterMain (argc=5, argv=<optimized
out>) at postmaster/./build/../src/backend/postmaster/postmaster.c:1415
#22 0x000056134d0e0377 in main (argc=5, argv=0x56134de8d300) at
main/./build/../src/backend/main/main.c:210

(gdb) info proc mappings
Mapped address spaces:

Start Addr End Addr Size Offset objfile
0x56134cfab000 0x56134d068000 0xbd000 0x0
/usr/lib/postgresql/13/bin/postgres
0x56134d068000 0x56134d60b000 0x5a3000 0xbd000
/usr/lib/postgresql/13/bin/postgres
0x56134d60b000 0x56134d827000 0x21c000 0x660000
/usr/lib/postgresql/13/bin/postgres
0x56134d827000 0x56134d845000 0x1e000 0x87b000
/usr/lib/postgresql/13/bin/postgres
0x56134d845000 0x56134d854000 0xf000 0x899000
/usr/lib/postgresql/13/bin/postgres
0x7f2e9599e000 0x7f2f1599e000 0x80000000 0x0
/dev/shm/PostgreSQL.940706000

(gdb) print *area
$1 = {control = 0x7f321dfa4500, mapping_pinned = false, segment_maps =
{{segment = 0x0, mapped_address = 0x7f321dfa4500 "", header =
0x7f321dfa4500, fpm = 0x7f321dfa5d20,
pagemap = 0x7f321dfa6168}, {segment = 0x56134dfa1ec8,
mapped_address = 0x7f3216cd8000 "", header = 0x7f3216cd8000, fpm =
0x7f3216cd8038, pagemap = 0x7f3216cd8480}, {
segment = 0x56134dfa1f18, mapped_address = 0x7f31f6bd7000 "",
header = 0x7f31f6bd7000, fpm = 0x7f31f6bd7038, pagemap =
0x7f31f6bd7480}, {segment = 0x56134dfa2078,
mapped_address = 0x7f30d60a6000 "", header = 0x7f30d60a6000, fpm
= 0x7f30d60a6038, pagemap = 0x7f30d60a6480}, {segment =
0x56134dfa2118, mapped_address = 0x7f30d58a6000 "",
header = 0x7f30d58a6000, fpm = 0x7f30d58a6038, pagemap =
0x7f30d58a6480}, {segment = 0x56134dfa20c8, mapped_address =
0x7f30d5ca6000 "", header = 0x7f30d5ca6000, fpm = 0x7f30d5ca6038,
pagemap = 0x7f30d5ca6480}, {segment = 0x56134dfa2168,
mapped_address = 0x7f30d50a6000 "", header = 0x7f30d50a6000, fpm =
0x7f30d50a6038, pagemap = 0x7f30d50a6480}, {
segment = 0x56134dfa21b8, mapped_address = 0x7f30d449e000 "",
header = 0x7f30d449e000, fpm = 0x7f30d449e038, pagemap =
0x7f30d449e480}, {segment = 0x56134dfa2208,
mapped_address = 0x7f30d2c90000 "", header = 0x7f30d2c90000, fpm
= 0x7f30d2c90038, pagemap = 0x7f30d2c90480}, {segment =
0x56134dfa2258, mapped_address = 0x7f30cfc76000 "",
header = 0x7f30cfc76000, fpm = 0x7f30cfc76038, pagemap =
0x7f30cfc76480}, {segment = 0x56134ee12048, mapped_address =
0x7f307599e000 "", header = 0x7f307599e000, fpm = 0x7f307599e038,
pagemap = 0x7f307599e480}, {segment = 0x56134ee11ff8,
mapped_address = 0x7f307b9d0000 "", header = 0x7f307b9d0000, fpm =
0x7f307b9d0038, pagemap = 0x7f307b9d0480}, {
segment = 0x56134ee11fa8, mapped_address = 0x7f3087a32000 "",
header = 0x7f3087a32000, fpm = 0x7f3087a32038, pagemap =
0x7f3087a32480}, {segment = 0x56134dfa2dd8,
mapped_address = 0x7f309faf4000 "", header = 0x7f309faf4000, fpm
= 0x7f309faf4038, pagemap = 0x7f309faf4480}, {segment =
0x56134dfa1fb8, mapped_address = 0x7f30d62d3000 "",
header = 0x7f30d62d3000, fpm = 0x7f30d62d3038, pagemap =
0x7f30d62d3480}, {segment = 0x56134dfa1f68, mapped_address =
0x7f31365d5000 "", header = 0x7f31365d5000, fpm = 0x7f31365d5038,
pagemap = 0x7f31365d5480}, {segment = 0x56134ee12098,
mapped_address = 0x7f306599e000 "", header = 0x7f306599e000, fpm =
0x7f306599e038, pagemap = 0x7f306599e480}, {
segment = 0x56134ee120e8, mapped_address = 0x7f305599e000 "",
header = 0x7f305599e000, fpm = 0x7f305599e038, pagemap =
0x7f305599e480}, {segment = 0x56134ee12138,
mapped_address = 0x7f303599e000 "", header = 0x7f303599e000, fpm
= 0x7f303599e038, pagemap = 0x7f303599e480}, {segment =
0x56134ee12188, mapped_address = 0x7f301599e000 "",
header = 0x7f301599e000, fpm = 0x7f301599e038, pagemap =
0x7f301599e480}, {segment = 0x56134ee121d8, mapped_address =
0x7f2fd599e000 "", header = 0x7f2fd599e000, fpm = 0x7f2fd599e038,
pagemap = 0x7f2fd599e480}, {segment = 0x56134ee12228,
mapped_address = 0x7f2f9599e000 "", header = 0x7f2f9599e000, fpm =
0x7f2f9599e038, pagemap = 0x7f2f9599e480}, {
segment = 0x56134ee12278, mapped_address = 0x7f2f1599e000 "",
header = 0x7f2f1599e000, fpm = 0x7f2f1599e038, pagemap =
0x7f2f1599e480}, {segment = 0x56134ee122c8,
mapped_address = 0x7f2e9599e000 "", header = 0x7f2e9599e000, fpm
= 0x7f2e9599e038, pagemap = 0x7f2e9599e480}, {segment = 0x0,
mapped_address = 0x0, header = 0x0, fpm = 0x0,
pagemap = 0x0} <repeats 1000 times>}, high_segment_index = 23,
freed_segment_counter = 0}

(gdb) frame 1
(gdb) print *hashtable
$2 = {nbuckets = 67108864, log2_nbuckets = 26, nbuckets_original =
67108864, nbuckets_optimal = 67108864, log2_nbuckets_optimal = 26,
buckets = {unshared = 0x7f31f6cd8000,
shared = 0x7f31f6cd8000}, keepNulls = false, skewEnabled = false,
skewBucket = 0x0, skewBucketLen = 0, nSkewBuckets = 0, skewBucketNums
= 0x0, nbatch = 1, curbatch = 0, nbatch_original = 1,
nbatch_outstart = 1, growEnabled = true, totalTuples = 65785362,
partialTuples = 5057580, skewTuples = 0, innerBatchFile = 0x0,
outerBatchFile = 0x0, outer_hashfunctions = 0x56134e1e04b8,
inner_hashfunctions = 0x56134e1e0508, hashStrict = 0x56134e1e0558,
collations = 0x56134e1e0570, spaceUsed = 0, spaceAllowed =
13958643712, spacePeak = 0, spaceUsedSkew = 0,
spaceAllowedSkew = 279172874, hashCxt = 0x56134e1e03a0, batchCxt =
0x56134e1e23b0, chunks = 0x0, current_chunk = 0x0, area =
0x56134e07d718, parallel_state = 0x7f321dfa4400,
batches = 0x56134e1e07f8, current_chunk_shared = 0}

This is the code where crashed happened
https://github.com/postgres/postgres/blob/8e5faba4b918ba6142339c8f55eaa4eb99776a03/src/backend/utils/mmgr/dsa.c#L835-L840:

/* Locate the object, span and pool. */
segment_map = get_segment_by_index(area, DSA_EXTRACT_SEGMENT_NUMBER(dp));
pageno = DSA_EXTRACT_OFFSET(dp) / FPM_PAGE_SIZE;
span_pointer = segment_map->pagemap[pageno];
span = dsa_get_address(area, span_pointer);
superblock = dsa_get_address(area, span->start);

(gdb) print *segment_map
$4 = {segment = 0x56134dfa2dd8, mapped_address = 0x7f309faf4000 "",
header = 0x7f309faf4000, fpm = 0x7f309faf4038, pagemap =
0x7f309faf4480}

(gdb) print pageno
$5 = 196979

(gdb) print span_pointer
$6 = 0

It looks that if `span_pointer` is 0, `span` is NULL and `span->start`
causes a segfault.
`span_pointer` is 0 because all `segment_map->pagemap` are zeros:

(gdb) print segment_map->pagemap[0]
$10 = 0
(gdb) print segment_map->pagemap[1]
$11 = 0
(gdb) print segment_map->pagemap[2]
$12 = 0
(gdb) print segment_map->pagemap[265]
$14 = 0
(gdb) print segment_map->pagemap[187387]
$15 = 0
(gdb) print segment_map->pagemap[196979]
$16 = 0

Regards,
Marcin Barczyński

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2024-05-23 13:32:17 BUG #18476: Debian Install Docs have confusing code block structure
Previous Message PG Bug reporting form 2024-05-23 11:37:58 BUG #18475: pg_dump: "Error Segmentation fault"