Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943

From: Tender Wang <tndrwang(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: 1026592243(at)qq(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943
Date: 2024-05-22 03:46:52
Message-ID: CAHewXNk125iiK4MgygmdwqGP=CQ4ZOMWLRr5+t9hAkXX8xd1ag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> 于2024年5月21日周二 00:16写道:

> On 2024-May-13, Alvaro Herrera wrote:
>
> > Evidently there's something I'm missing in how this plancache
> > invalidation works.
>
> Actually, what I was missing is that I had forgotten how
> PartitionDirectory is actually used. In the original code I wrote for
> DETACH CONCURRENTLY for 2ndQPostgres a few years ago, I set things up so
> that both planner and executor used the same PartitionDesc for a
> partitioned table. But when this was reworked for real Postgres by
> Robert to implement concurrent ATTACH, he instead introduced
> PartitionDirectory which keeps a PartitionDesc unchanged -- but planner
> and executor use different PartitionDirectories. I probably reworked
> some of DETACH CONCURRENTLY to cope with that, but evidently must have
> missed this scenario.
>
> So the problem is that we read the partition descriptor during
> re-planning with N partitions, and then if we receive the corresponding
> invalidation message (at executor time) exactly before
> CreatePartitionPruneState() creates its PartitionDirectory, then we read
> N-1 partitions, and everything explodes. (If the invalidation is
> received at some other time, then the plan is remade before
> ExecInitAppend by plancache, and everything works correctly.)
>
> So one fix for this problem seems to be to patch
> CreatePartitionPruneState() to accept the possibility that one partition
> has been detached concurrently, and construct the planmap/partmap arrays
> with a marking that the partition in question has been pruned away.
>
>
> With that fix in place, I was now getting errors from
> RelationBuildPartitionDesc() that some partition in the partdesc had
> NULL relpartbound. If I understand correctly, the problem here is that
> we have a snapshot that still sees that partition as being in the
> hierarchy, but in reality its relpartbound has been nullified by the
> concurrent detach. I can fix this symptom by doing
> AcceptInvalidationMesages() and starting back at the top, where the list
> of partitions is obtained by find_inheritance_children_extended().
>
> With these two things in place , the pgbench scripts run without errors.
> That's in patch 0001, which deserves a commit message that distills the
> above.
>
>
> I'm not convinced that this fix is correct or complete, but hey, it's
> progress. Here's some things I'm worried about:
>
> 1. this code assumes that a single partition can be in this "detach
> concurrently" mode. This sounds correct, because this operation needs
> share update exclusive lock on the table, and we only allow one
> partition to be in pending-detach state. However, what if we have a
> very slow process doing the query-side mode, and two processes manage to
> detach two different tables in between? I think this isn't possible
> because of the wait-for-snapshots phase, but I need to convince myself
> that this is the right explanation.
>
> 2. The new code in CreatePartitionPruneState assumes that if nparts
> decreases, then it must be a detach, and if nparts increases, it must be
> an attach. Can these two things happen together in a way that we see
> that the number of partitions remains the same, so we don't actually try
> to construct planmap/partmap arrays by matching their OIDs? I think the
> only way to handle a possible problem here would be to verify the OIDs
> every time we construct a partition descriptor. I assume (without
> checking) this would come with a performance cost, not sure.
>
> 3. The new stanza in CreatePartitionPruneState() should have more
> sanity-checks that the two arrays we scan really match.
>
> 4. I made RelationBuildPartitionDesc retry only once. I think this
> should be enough, because a single AcceptInvalidationMessages should
> process everything that we need to read to bring us up to current
> reality, considering that DETACH CONCURRENTLY would need to wait for our
> snapshot.
>
> I would like to spend some more time on this, but I'm afraid I'll have
> to put it off for a little while.
>
> PS -- yes, the self-contradictory comment in CreatePartitionPruneState()
> is bogus. Patch 0002 fixes that.
>

I have tested this patch locally and did not encounter any failures.
I will take time to look the patch in detail and consider the issues you
mentioned.
--
Tender Wang
OpenPie: https://en.openpie.com/

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message joachim.haecker-becker 2024-05-22 06:40:07 AW: Re: BUG #18471: Possible JIT memory leak resulting in signal 11:Segmentation fault on ARM
Previous Message Kamil ADEM 2024-05-21 19:40:52 RE: Postgresql 16.3 installation error (setup file) on Windows 11