From: | "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com> |
---|---|
To: | 'Mark Dilger' <mark(dot)dilger(at)enterprisedb(dot)com> |
Cc: | "akapila(at)postgresql(dot)org" <akapila(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | RE: Fix for segfault in logical replication on master |
Date: | 2021-06-17 10:39:32 |
Message-ID: | OSBPR01MB4888BA75D9B4CCCA0D78D55FED0E9@OSBPR01MB4888.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thursday, June 17, 2021 2:43 PM Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> > On Jun 16, 2021, at 10:19 PM, osumi(dot)takamichi(at)fujitsu(dot)com wrote:
> >
> > I started to analyze your report and
> > will reply after my idea to your modification is settled.
>
> Thank you.
I'll share my first analysis.
> In commit e7eea52b2d, you introduced a new function,
> RelationGetIdentityKeyBitmap(), which uses some odd logic for determining
> if a relation has a replica identity index. That code segfaults under certain
> conditions. A test case to demonstrate that is attached. Prior to patching
> the code, this new test gets stuck waiting for replication to finish, which never
> happens. You have to break out of the test and check
> tmp_check/log/021_no_replica_identity_publisher.log.
>
> I believe this bit of logic in src/backend/utils/cache/relcache.c:
>
> indexDesc = RelationIdGetRelation(relation->rd_replidindex);
> for (i = 0; i < indexDesc->rd_index->indnatts; i++)
>
> is unsafe without further checks, also attached.
You are absolutely right.
I checked the crash scenario and reproduced the core,
which has a null indexDesc. Also, rd_replidindex must be checked beforehand
as you included in your patch, because having an index does not necessarily
mean to have a replica identity index. As the proof of this, the oid of
rd_replidindex in the scenario is 0. OTOH, I've confirmed your new test
has passed with your fix.
Also, your test looks essentially minimum(suitable for the problem) to me.
* RelationGetIdentityKeyBitmap
+ /*
+ * Fall out if the description is not for an index, suggesting
+ * affairs have changed since we looked. XXX Should we log a
+ * complaint here?
+ */
+ if (!indexDesc)
+ return NULL;
+ if (!indexDesc->rd_index)
+ {
+ RelationClose(indexDesc);
+ return NULL;
+ }
For the 1st check, isn't it better to use RelationIsValid() ?
I agree with having the check itself of course, though.
Additionally, In what kind of actual scenario, did you think that
we come to the part to "log a complaint" ?
I'm going to spend some time to analyze RelationGetIndexAttrBitmap next
to know if similar hazards can happen, because RelationGetIdentityKeyBitmap's logic
comes from the function mainly.
Best Regards,
Takamichi Osumi
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2021-06-17 10:57:24 | Re: locking [user] catalog tables vs 2pc vs logical rep |
Previous Message | Fabien COELHO | 2021-06-17 09:52:04 | Re: pgbench bug candidate: negative "initial connection time" |