Re: Truncate in synchronous logical replication failed

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>
Cc: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Truncate in synchronous logical replication failed
Date: 2021-04-14 10:01:13
Message-ID: CAA4eK1KzpTzo0kUs1b0s4xb_apuZdRG5z2FnUSmOVF8eoQ2cVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
<petr(dot)jelinek(at)enterprisedb(dot)com> wrote:
>
> > On 12 Apr 2021, at 08:58, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > The problem happens only when we try to fetch IDENTITY_KEY attributes
> > because pgoutput uses RelationGetIndexAttrBitmap() to get that
> > information which locks the required indexes. Now, because TRUNCATE
> > has already acquired an exclusive lock on the index, it seems to
> > create a sort of deadlock where the actual Truncate operation waits
> > for logical replication of operation to complete and logical
> > replication waits for actual Truncate operation to finish.
> >
> > Do we really need to use RelationGetIndexAttrBitmap() to build
> > IDENTITY_KEY attributes? During decoding, we don't even lock the main
> > relation, we just scan the system table and build that information
> > using a historic snapshot. Can't we do something similar here?
> >
> > Adding Petr J. and Peter E. to know their views as this seems to be an
> > old problem (since the decoding of Truncate operation is introduced).
>
> We used RelationGetIndexAttrBitmap because it already existed, no other reason.
>

Fair enough. But I think we should do something about it because using
the same (RelationGetIndexAttrBitmap) just breaks the synchronous
logical replication. I think this is broken since the logical
replication of Truncate is supported.

> I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed.
>

Are you telling that we need AccessShareLock on the index? If so, why
is it different from how we access the relation during decoding
(basically in ReorderBufferProcessTXN, we directly use
RelationIdGetRelation() without any lock on the relation)? I think we
do it that way because we need it to process WAL entries and we need
the relation state based on the historic snapshot, so, even if the
relation is later changed/dropped, we are fine with the old state we
got. Isn't the same thing applies here in logicalrep_write_attrs? If
that is true then some equivalent of RelationGetIndexAttrBitmap where
we use RelationIdGetRelation instead of index_open should work? Am, I
missing something?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-04-14 10:19:51 Re: Unresolved repliaction hang and stop problem.
Previous Message Andrei Zubkov 2021-04-14 09:38:42 Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements