FailedAssertion("pd_idx == pinfo->nparts", File: "execPartition.c", Line: 1689)

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Robert Haas <rhaas(at)postgresql(dot)org>
Subject: FailedAssertion("pd_idx == pinfo->nparts", File: "execPartition.c", Line: 1689)
Date: 2020-08-02 18:11:31
Message-ID: 20200802181131.GA27754@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Core was generated by `postgres: telsasoft ts [local] BIND '.

(gdb) bt
#0 0x00007f0951303387 in raise () from /lib64/libc.so.6
#1 0x00007f0951304a78 in abort () from /lib64/libc.so.6
#2 0x0000000000921005 in ExceptionalCondition (conditionName=conditionName(at)entry=0xa5db3d "pd_idx == pinfo->nparts", errorType=errorType(at)entry=0x977389 "FailedAssertion",
fileName=fileName(at)entry=0xa5da88 "execPartition.c", lineNumber=lineNumber(at)entry=1689) at assert.c:67
#3 0x0000000000672806 in ExecCreatePartitionPruneState (planstate=planstate(at)entry=0x908f6d8, partitionpruneinfo=<optimized out>) at execPartition.c:1689
#4 0x000000000068444a in ExecInitAppend (node=node(at)entry=0x7036b90, estate=estate(at)entry=0x11563f0, eflags=eflags(at)entry=16) at nodeAppend.c:132
#5 0x00000000006731fd in ExecInitNode (node=0x7036b90, estate=estate(at)entry=0x11563f0, eflags=eflags(at)entry=16) at execProcnode.c:179
#6 0x000000000069d03a in ExecInitResult (node=node(at)entry=0x70363d8, estate=estate(at)entry=0x11563f0, eflags=eflags(at)entry=16) at nodeResult.c:210
#7 0x000000000067323c in ExecInitNode (node=0x70363d8, estate=estate(at)entry=0x11563f0, eflags=eflags(at)entry=16) at execProcnode.c:164
#8 0x000000000069e834 in ExecInitSort (node=node(at)entry=0x7035ca8, estate=estate(at)entry=0x11563f0, eflags=eflags(at)entry=16) at nodeSort.c:210
#9 0x0000000000672ff0 in ExecInitNode (node=0x7035ca8, estate=estate(at)entry=0x11563f0, eflags=eflags(at)entry=16) at execProcnode.c:313
#10 0x00000000006812e8 in ExecInitAgg (node=node(at)entry=0x68311d0, estate=estate(at)entry=0x11563f0, eflags=eflags(at)entry=16) at nodeAgg.c:3292
#11 0x0000000000672fb1 in ExecInitNode (node=0x68311d0, estate=estate(at)entry=0x11563f0, eflags=eflags(at)entry=16) at execProcnode.c:328
#12 0x000000000068925a in ExecInitGatherMerge (node=node(at)entry=0x6830998, estate=estate(at)entry=0x11563f0, eflags=eflags(at)entry=16) at nodeGatherMerge.c:110
#13 0x0000000000672f33 in ExecInitNode (node=0x6830998, estate=estate(at)entry=0x11563f0, eflags=eflags(at)entry=16) at execProcnode.c:348
#14 0x00000000006812e8 in ExecInitAgg (node=node(at)entry=0x682eda8, estate=estate(at)entry=0x11563f0, eflags=eflags(at)entry=16) at nodeAgg.c:3292
#15 0x0000000000672fb1 in ExecInitNode (node=node(at)entry=0x682eda8, estate=estate(at)entry=0x11563f0, eflags=eflags(at)entry=16) at execProcnode.c:328
#16 0x000000000066c8e6 in InitPlan (eflags=16, queryDesc=<optimized out>) at execMain.c:1020
#17 standard_ExecutorStart (queryDesc=<optimized out>, eflags=16) at execMain.c:266
#18 0x00007f0944ca83b5 in pgss_ExecutorStart (queryDesc=0x1239b08, eflags=<optimized out>) at pg_stat_statements.c:1007
#19 0x00007f09117e4891 in explain_ExecutorStart (queryDesc=0x1239b08, eflags=<optimized out>) at auto_explain.c:301
#20 0x00000000007f9983 in PortalStart (portal=0xeff810, params=0xfacc98, eflags=0, snapshot=0x0) at pquery.c:505
#21 0x00000000007f7370 in PostgresMain (argc=<optimized out>, argv=argv(at)entry=0xeb8500, dbname=0xeb84e0 "ts", username=<optimized out>) at postgres.c:1987
#22 0x000000000048916e in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4523
#23 BackendStartup (port=0xeb1000) at postmaster.c:4215
#24 ServerLoop () at postmaster.c:1727
#25 0x000000000076ec85 in PostmasterMain (argc=argc(at)entry=13, argv=argv(at)entry=0xe859b0) at postmaster.c:1400
#26 0x000000000048a82d in main (argc=13, argv=0xe859b0) at main.c:210

#3 0x0000000000672806 in ExecCreatePartitionPruneState (planstate=planstate(at)entry=0x908f6d8, partitionpruneinfo=<optimized out>) at execPartition.c:1689
pd_idx = <optimized out>
pp_idx = <optimized out>
pprune = 0x908f910
partdesc = 0x91937f8
pinfo = 0x7d6ee78
partrel = <optimized out>
partkey = 0xfbba28
lc2__state = {l = 0x7d6ee20, i = 0}
partrelpruneinfos = 0x7d6ee20
lc2 = <optimized out>
npartrelpruneinfos = <optimized out>
prunedata = 0x908f908
j = 0
lc__state = {l = 0x7d6edc8, i = 0}
estate = 0x11563f0
prunestate = 0x908f8b0
n_part_hierarchies = <optimized out>
lc = <optimized out>
i = 0

(gdb) p *pinfo
$2 = {type = T_PartitionedRelPruneInfo, rtindex = 7, present_parts = 0x7d6ef10, nparts = 414, subplan_map = 0x7d6ef68, subpart_map = 0x7d6f780, relid_map = 0x7d6ff98, initial_pruning_steps = 0x7d707b0,
exec_pruning_steps = 0x0, execparamids = 0x0}

(gdb) p pd_idx
$3 = <optimized out>

< 2020-08-02 02:04:17.358 SST >LOG: server process (PID 20954) was terminated by signal 6: Aborted
< 2020-08-02 02:04:17.358 SST >DETAIL: Failed process was running:
INSERT INTO child.cdrs_data_users_per_cell_20200801 (...list of columns elided...)
(
SELECT ..., $3::timestamp, $2,
MODE() WITHIN GROUP (ORDER BY ...) AS ..., STRING_AGG(DISTINCT ..., ',') AS ..., ...

This crashed at 2am, which at first I thought was maybe due to simultaneously
creating today's partition.

Aug 2 02:04:08 telsasoftsky abrt-hook-ccpp: Process 19264 (postgres) of user 26 killed by SIGABRT - dumping core
Aug 2 02:04:17 telsasoftsky abrt-hook-ccpp: Process 20954 (postgres) of user 26 killed by SIGABRT - ignoring (repeated crash)

Running:
postgresql13-server-13-beta2_1PGDG.rhel7.x86_64

Maybe this is a problem tickled by something new in v13. However, this is a
new VM, and at the time of the crash I was running a shell loop around
pg_restore, in reverse-chronological order. I have full logs, and I found that
just CREATEd was a table which the crashing process would've tried to SELECT FROM:

| 2020-08-02 02:04:01.48-11 | duration: 106.275 ms statement: CREATE TABLE child.cdrs_huawei_sgwrecord_2019_06_14 (

That table *currently* has:
|Number of partitions: 416 (Use \d+ to list them.)
And the oldest table is still child.cdrs_huawei_sgwrecord_2019_06_14 (since the
shell loop probably quickly spun through hundreds of pg_restores, failing to
connect to the database "in recovery"). And today's partition was already
created, at: 2020-08-02 01:30:35. So I think

Based on commit logs, I suspect this may be an "older bug", specifically maybe
with:

|commit 898e5e3290a72d288923260143930fb32036c00c
|Author: Robert Haas <rhaas(at)postgresql(dot)org>
|Date: Thu Mar 7 11:13:12 2019 -0500
|
| Allow ATTACH PARTITION with only ShareUpdateExclusiveLock.

I don't think it matters, but the process surrounding the table being INSERTed
INTO is more than a little special, involving renames, detaches, creation,
re-attaching within a transaction. I think that doesn't matter though, and the
issue is surrounding the table being SELECTed *from*, which is actually behind
a view.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2020-08-02 20:43:33 Re: Default gucs for EXPLAIN
Previous Message Tom Lane 2020-08-02 17:37:39 Removing <@ from contrib/intarray's GiST opclasses