Re: Truncate in synchronous logical replication failed

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Japin Li <japinli(at)hotmail(dot)com>
Cc: Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Truncate in synchronous logical replication failed
Date: 2021-04-16 08:52:27
Message-ID: CAA4eK1+ShZgDvFApVK_PPfM8f+XzN4skb7Mj54H3MEAgN5OxWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 16, 2021 at 12:55 PM Japin Li <japinli(at)hotmail(dot)com> wrote:
>
> On Thu, 15 Apr 2021 at 19:25, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > On Thu, Apr 15, 2021 at 4:30 PM Japin Li <japinli(at)hotmail(dot)com> wrote:
> >>
> >>
> >> On Thu, 15 Apr 2021 at 18:22, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >> > On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >> >>
> >> >> On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
> >> >> <petr(dot)jelinek(at)enterprisedb(dot)com> wrote:
> >> >> >
> >> >> > > On 12 Apr 2021, at 08:58, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >> >> > >
> >> >> > > The problem happens only when we try to fetch IDENTITY_KEY attributes
> >> >> > > because pgoutput uses RelationGetIndexAttrBitmap() to get that
> >> >> > > information which locks the required indexes. Now, because TRUNCATE
> >> >> > > has already acquired an exclusive lock on the index, it seems to
> >> >> > > create a sort of deadlock where the actual Truncate operation waits
> >> >> > > for logical replication of operation to complete and logical
> >> >> > > replication waits for actual Truncate operation to finish.
> >> >> > >
> >> >> > > Do we really need to use RelationGetIndexAttrBitmap() to build
> >> >> > > IDENTITY_KEY attributes? During decoding, we don't even lock the main
> >> >> > > relation, we just scan the system table and build that information
> >> >> > > using a historic snapshot. Can't we do something similar here?
> >> >> > >
> >> >> > > Adding Petr J. and Peter E. to know their views as this seems to be an
> >> >> > > old problem (since the decoding of Truncate operation is introduced).
> >> >> >
> >> >> > We used RelationGetIndexAttrBitmap because it already existed, no other reason.
> >> >> >
> >> >>
> >> >> Fair enough. But I think we should do something about it because using
> >> >> the same (RelationGetIndexAttrBitmap) just breaks the synchronous
> >> >> logical replication. I think this is broken since the logical
> >> >> replication of Truncate is supported.
> >> >>
> >> >> > I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed.
> >> >> >
> >> >>
> >> >> Are you telling that we need AccessShareLock on the index? If so, why
> >> >> is it different from how we access the relation during decoding
> >> >> (basically in ReorderBufferProcessTXN, we directly use
> >> >> RelationIdGetRelation() without any lock on the relation)? I think we
> >> >> do it that way because we need it to process WAL entries and we need
> >> >> the relation state based on the historic snapshot, so, even if the
> >> >> relation is later changed/dropped, we are fine with the old state we
> >> >> got. Isn't the same thing applies here in logicalrep_write_attrs? If
> >> >> that is true then some equivalent of RelationGetIndexAttrBitmap where
> >> >> we use RelationIdGetRelation instead of index_open should work?
> >> >>
> >> >
> >> > Today, again I have thought about this and don't see a problem with
> >> > the above idea. If the above understanding is correct, then I think
> >> > for our purpose in pgoutput, we just need to call RelationGetIndexList
> >> > and then build the idattr list for relation->rd_replidindex.
> >>
> >> Sorry, I don't know how can we build the idattr without open the index.
> >> If we should open the index, then we should use NoLock, since the TRUNCATE
> >> side hold AccessExclusiveLock. As Osumi points out in [1], The NoLock mode
> >> assumes that the appropriate lock on the index is already taken.
> >>
> >
> > Why can't we use RelationIdGetRelation() by passing the required
> > indexOid to it?
>
> Hi Amit, as your suggested, I try to use RelationIdGetRelation() replace
> the index_open() to avoid specify the AccessSharedLock, then the TRUNCATE
> will not be blocked.
>

It is okay as POC but we can't change the existing function
RelationGetIndexAttrBitmap. It is used from other places as well. It
might be better to write a separate function for this, something like
what Osumi-San's patch is trying to do.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2021-04-16 09:19:08 RE: Truncate in synchronous logical replication failed
Previous Message Amit Kapila 2021-04-16 08:50:06 Re: Truncate in synchronous logical replication failed