Re: hyrax vs. RelationBuildPartitionDesc

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, amul sul <sulamul(at)gmail(dot)com>
Subject: Re: hyrax vs. RelationBuildPartitionDesc
Date: 2019-06-05 11:03:48
Message-ID: CA+HiwqEwrP0TettuL5bbwGH_hu3L=5am9pajy7FmASQYDBq4aw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 4, 2019 at 9:25 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, May 17, 2019 at 4:36 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Yeah, I did some additional testing that showed that it's pretty darn
> > hard to get the leak to amount to anything. The test case that I
> > originally posted did many DDLs in a single transaction, and it
> > seems that that's actually essential to get a meaningful leak; as
> > soon as you exit the transaction the leaked contexts will be recovered
> > during sinval cleanup.
>
> My colleague Amul Sul rediscovered this same leak when he tried to
> attach lots of partitions to an existing partitioned table, all in the
> course of a single transaction. This seems a little less artificial
> than Tom's original reproducer, which involved attaching and detaching
> the same partition repeatedly.
>
> Here is a patch that tries to fix this, along the lines I previously
> suggested; Amul reports that it does work for him.

Thanks for the patch.

I noticed a crash with one of the scenarios that I think the patch is
meant to address. Let me describe the steps:

Attach gdb (or whatever) to session 1 in which we'll run a query that
will scan at least two partitions and set a breakpoint in
expand_partitioned_rtentry(). Run the query, so the breakpoint will
be hit. Step through up to the start of the following loop in this
function:

i = -1;
while ((i = bms_next_member(live_parts, i)) >= 0)
{
Oid childOID = partdesc->oids[i];
Relation childrel;
RangeTblEntry *childrte;
Index childRTindex;
RelOptInfo *childrelinfo;

/* Open rel, acquiring required locks */
childrel = table_open(childOID, lockmode);

Note that 'partdesc' in the above code is from the partition
directory. Before stepping through into the loop, start another
session and attach a new partition. On into the loop. When the 1st
partition is opened, its locking will result in
RelationClearRelation() being called on the parent relation due to
partition being attached concurrently, which I observed, is actually
invoked a couple of times due to recursion. Parent relation's
rd_oldpdcxt will be set in this process, which contains the
aforementioned partition descriptor.

Before moving the loop to process the 2nd partition, attach another
partition in session 2. When the 2nd partition is opened,
RelationClearRelation() will run again and in one of its recursive
invocations, it destroys the rd_oldpdcxt that was set previously,
taking the partition directory's partition descriptor with it.
Anything that tries to access it later crashes.

I couldn't figure out what to blame here though -- the design of
rd_pdoldcxt or the fact that RelationClearRelation() is invoked
multiple times. I will try to study it more closely tomorrow.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-06-05 11:18:15 Re: Confusing comment for function ExecParallelEstimate
Previous Message David Rowley 2019-06-05 09:08:46 Re: Confusing error message for REINDEX TABLE CONCURRENTLY