Re: FailedAssertion("pd_idx == pinfo->nparts", File: "execPartition.c", Line: 1689)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FailedAssertion("pd_idx == pinfo->nparts", File: "execPartition.c", Line: 1689)
Date: 2020-08-07 03:31:28
Message-ID: 2717732.1596771088@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Amit Langote <amitlangote09(at)gmail(dot)com> writes:
> On Fri, Aug 7, 2020 at 9:32 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> ... Amit's recipe for reproducing the bug works because there are
>> other relation lock acquisitions (and hence AcceptInvalidationMessages
>> calls) later in planning than where he asked us to wait. So this
>> effectively tests a scenario where a very late A.I.M. call within the
>> planner detects an inval event for some already-planned relation, and
>> that seems like a valid-enough scenario.

> Agreed.

> Curiously, Justin mentioned upthread that the crash occurred during
> BIND of a prepared query, so it better had been that a custom plan was
> being executed, because a generic one based on fewer partitions would
> be thrown away due to A.I.M. invoked during AcquireExecutorLocks().

Based on the above, it seems plausible that the plancache did throw away
an old plan and try to replan, but the inval message announcing partition
addition arrived too late during that planning cycle. Just like the
normal execution path, the plancache code path won't do more than one
iteration of planning on the way to a demanded query execution.

>> (BTW, I checked and found that this test does *not* expose the problems
>> with Amit's original patch. Not sure if it's worth trying to finagle
>> it so it does.)

> I tried to figure out a scenario where my patch would fail but
> couldn't come up with one either, but it's no proof that it isn't
> wrong. For example, I can see that pinfo->relid_map[pinfo->nparts]
> can be accessed with my patch which is not correct.

Yeah, touching array entries off the end of the relid_map array definitely
seems possible with that coding. But the scenario I was worried about
was that the loop actually attaches the wrong subplan (one for a different
partition) to a partdesc entry. In an assert-enabled build, that would
have led to assertion failure just below, because then we could not match
up all the remaining relid_map entries; but in a non-assert build, we'd
plow through and bad things would likely happen during execution.
You might need further conditions, like the partitions not being all
identical, for that to actually cause any problem. I'd poked at this
for a little bit without causing an obvious crash, but I can't claim
to have tried hard.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-08-07 03:33:37 Re: FailedAssertion("pd_idx == pinfo->nparts", File: "execPartition.c", Line: 1689)
Previous Message Amit Langote 2020-08-07 03:16:11 Re: FailedAssertion("pd_idx == pinfo->nparts", File: "execPartition.c", Line: 1689)