Re: Forget close an open relation in ReorderBufferProcessTXN()

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Japin Li <japinli(at)hotmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Forget close an open relation in ReorderBufferProcessTXN()
Date: 2021-05-22 02:57:37
Message-ID: CA+HiwqFnjnB0ttgzqSNfJY2nq9xr9S5ywDrj=ujYzmCcEZGvug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, May 22, 2021 at 11:00 AM osumi(dot)takamichi(at)fujitsu(dot)com
<osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> On Friday, May 21, 2021 9:45 PM I worte:
> > On Friday, May 21, 2021 4:43 PM Amit Langote <amitlangote09(at)gmail(dot)com>
> > wrote:
> > > On Fri, May 21, 2021 at 3:55 PM osumi(dot)takamichi(at)fujitsu(dot)com
> > > <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> > > > But, I've detected segmentation faults caused by the patch, which
> > > > can happen during 100_bugs.pl in src/test/subscription.
> > >
> > > Hmm, maybe get_rel_syn_entry() should explicitly set map to NULL when
> > > first initializing an entry. It's possible that without doing so, the
> > > map remains set to a garbage value, which causes the invalidation
> > > callback that runs into such partially initialized entry to segfault
> > > upon trying to deference that garbage pointer.
> > Just in case, I prepared a new PG and
> > did a check to make get_rel_sync_entry() print its first pointer value with elog.
> > Here, when I executed 100_bugs.pl, I got some garbage below.
> >
> > * The change I did:
> > @@ -1011,6 +1011,7 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
> > entry->pubactions.pubinsert =
> > entry->pubactions.pubupdate =
> > entry->pubactions.pubdelete =
> > entry->pubactions.pubtruncate = false;
> > entry->publish_as_relid = InvalidOid;
> > + elog(LOG, "**> the pointer's default value : %p",
> > + entry->map);
> > }
> >
> (snip)
> >
> > So, your solution is right, I think.
> This was a bit indirect.
> I've checked the core file of v3's failure core and printed the entry
> to get more confidence. Sorry for inappropriate measure to verify the solution.
>
> $1 = {relid = 16388, schema_sent = false, streamed_txns = 0x0, replicate_valid = false, pubactions = {pubinsert = false, pubupdate = false, pubdelete = false, pubtruncate = false}, publish_as_relid = 16388,
> map = 0x7f7f7f7f7f7f7f7f}
>
> Yes, the process tried to free garbage.
> Now, we are convinced that we have addressed the problem. That's it !

Thanks for confirming that.

--
Amit Langote
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2021-05-22 03:06:33 Re: Subscription tests fail under CLOBBER_CACHE_ALWAYS
Previous Message osumi.takamichi@fujitsu.com 2021-05-22 02:00:52 RE: Forget close an open relation in ReorderBufferProcessTXN()