Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Tender Wang <tndrwang(at)gmail(dot)com>
Cc: 1026592243(at)qq(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943
Date: 2024-06-05 15:39:44
Message-ID: 202406051539.h3o6qgkri5ij@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 2024-May-22, Tender Wang wrote:

> I have tested this patch locally and did not encounter any failures.
> I will take time to look the patch in detail and consider the issues you
> mentioned.

Thank you. In the meantime,

> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> 于2024年5月21日周二 00:16写道:

> > 2. The new code in CreatePartitionPruneState assumes that if nparts
> > decreases, then it must be a detach, and if nparts increases, it must be
> > an attach. Can these two things happen together in a way that we see
> > that the number of partitions remains the same, so we don't actually try
> > to construct planmap/partmap arrays by matching their OIDs? I think the
> > only way to handle a possible problem here would be to verify the OIDs
> > every time we construct a partition descriptor. I assume (without
> > checking) this would come with a performance cost, not sure.

I modified the scripts slightly so that two partitions would be
detached, and lo and behold -- the case where we have one new partition
appearing and one partition disappearing concurrently can indeed happen.
So we have that both nparts are identical, but the OID arrays don't
match. I attach the scripts I used to test.

I think in order to fix this we would have to compare the OID arrays
each time through CreatePartitionPruneState, so that we can mark as
"pruned" (value -1) any partition that's not on either of the partdescs.
Having to compare the whole arrays each and every time might not be
great, but I don't see any backpatchable alternative at the moment.
Going forward, we could avoid the hit by having something like a
generation counter for the partitioned table (which is incremented for
each attach and detach), but of course that's not backpatchable.

PS: the pg_advisory_unlock() calls are necessary, because otherwise the
session that first succeeds the try_lock function retains the lock for
the whole duration of the pgbench script, so the other sessions always
skip the "\if :gotlock" block.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"I'm impressed how quickly you are fixing this obscure issue. I came from
MS SQL and it would be hard for me to put into words how much of a better job
you all are doing on [PostgreSQL]."
Steve Midgley, http://archives.postgresql.org/pgsql-sql/2008-08/msg00000.php

Attachment Content-Type Size
18377-0.sql application/sql 51 bytes
18377-1.sql application/sql 448 bytes
18377-2.sql application/sql 448 bytes
18377-3.sql application/sql 266 bytes
18377-setup.sql application/sql 441 bytes

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Erik Wienhold 2024-06-05 16:08:13 Re: BUG #18495: invalid type mapping for timestamptz from call of: getMetaData and then geColumns on PgConnection.
Previous Message PG Bug reporting form 2024-06-05 13:21:12 BUG #18495: invalid type mapping for timestamptz from call of: getMetaData and then geColumns on PgConnection.