From: | Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com> |
---|---|
To: | kuntalghosh(dot)2007(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, Alvaro Herrera from 2ndQuadrant <alvherre(at)alvh(dot)no-ip(dot)org> |
Subject: | Re: BUG #18559: Crash after detaching a partition concurrently from another session |
Date: | 2024-07-30 13:52:33 |
Message-ID: | CAGz5QC+yTwKXAYgvvmiPAwez4CVkwO==ECDK+3fMAJ8j9Lcbkw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Tue, Jul 30, 2024 at 7:18 PM PG Bug reporting form
<noreply(at)postgresql(dot)org> wrote:
Adding Alvaro.
>
> The following code assumes that an pg_class entry for the detached partition
> will always be available which is wrong.
>
Referring to the following code.
/*
* Two problems are possible here. First, a concurrent ATTACH
* PARTITION might be in the process of adding a new partition, but
* the syscache doesn't have it, or its copy of it does not yet have
* its relpartbound set. We cannot just AcceptInvalidationMessages(),
* because the other process might have already removed itself from
* the ProcArray but not yet added its invalidation messages to the
* shared queue. We solve this problem by reading pg_class directly
* for the desired tuple.
*
* The other problem is that DETACH CONCURRENTLY is in the process of
* removing a partition, which happens in two steps: first it marks it
* as "detach pending", commits, then unsets relpartbound. If
* find_inheritance_children_extended included that partition but we
* below we see that DETACH CONCURRENTLY has reset relpartbound for
* it, we'd see an inconsistent view. (The inconsistency is seen
* because table_open below reads invalidation messages.) We protect
* against this by retrying find_inheritance_children_extended().
*/
if (boundspec == NULL)
{
Relation pg_class;
SysScanDesc scan;
ScanKeyData key[1];
Datum datum;
bool isnull;
pg_class = table_open(RelationRelationId, AccessShareLock);
ScanKeyInit(&key[0],
Anum_pg_class_oid,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(inhrelid));
scan = systable_beginscan(pg_class, ClassOidIndexId, true,
NULL, 1, key);
tuple = systable_getnext(scan);
datum = heap_getattr(tuple, Anum_pg_class_relpartbound,
RelationGetDescr(pg_class), &isnull);
if (!isnull)
boundspec = stringToNode(TextDatumGetCString(datum));
systable_endscan(scan);
table_close(pg_class, AccessShareLock);
IIUC, a simple fix would be to retry if an entry is not found. Attached a patch.
--
Thanks & Regards,
Kuntal Ghosh
Attachment | Content-Type | Size |
---|---|---|
0001-Fix-creation-of-partition-descriptor-during-concurre.patch | application/octet-stream | 1.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | semab tariq | 2024-07-30 14:54:41 | Re: Installer initialization failed |
Previous Message | PG Bug reporting form | 2024-07-30 13:47:15 | BUG #18559: Crash after detaching a partition concurrently from another session |