Re: Forget close an open relation in ReorderBufferProcessTXN()

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
Cc: Japin Li <japinli(at)hotmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Forget close an open relation in ReorderBufferProcessTXN()
Date: 2021-05-17 09:44:33
Message-ID: CAA4eK1KoLz4bciTqRwPM1b4SEsb=UbjYzzvAOSK=XVPKM3VA6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 14, 2021 at 2:20 PM osumi(dot)takamichi(at)fujitsu(dot)com
<osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
>
> On Thursday, May 13, 2021 7:21 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > I don't think we can reproduce it with core plugins as they don't lock user
> > catalog tables.
> OK. My current understanding about how the deadlock happens is below.
>
> 1. TRUNCATE command is performed on user_catalog_table.
> 2. TRUNCATE command locks the table and index with ACCESS EXCLUSIVE LOCK.
> 3. TRUNCATE waits for the subscriber's synchronization
> when synchronous_standby_names is set.
> 4. Here, the walsender stops, *if* it tries to acquire a lock on the user_catalog_table
> because the table where it wants to see is locked by the TRUNCATE already.
>
> If this is right,
>

Yeah, the above steps are correct, so if we take a lock on
user_catalog_table when walsender is processing the WAL, it would lead
to a problem.

> we need to go back to a little bit higher level discussion,
> since whether we should hack any plugin to simulate the deadlock caused by user_catalog_table reference
> with locking depends on the assumption if the plugin takes a lock on the user_catalog_table or not.
> In other words, if the plugin does read only access to that type of table with no lock
> (by RelationIdGetRelation for example ?), the deadlock concern disappears and we don't
> need to add anything to plugin sides, IIUC.
>

True, if the plugin doesn't acquire any lock on user_catalog_table,
then it is fine but we don't prohibit plugins to acquire locks on
user_catalog_tables. This is similar to system catalogs, the plugins
and decoding code do acquire lock on those.

> Here, we haven't gotten any response about whether output plugin takes (should take)
> the lock on the user_catalog_table. But, I would like to make a consensus
> about this point before the implementation.
>
> By the way, Amit-san already mentioned the main reason of this
> is that we allow decoding of TRUNCATE operation for user_catalog_table in synchronous mode.
> The choices are provided by Amit-san already in the past email in [1].
> (1) disallow decoding of TRUNCATE operation for user_catalog_tables
> (2) disallow decoding of any operation for user_catalog_tables like system catalog tables
>
> Yet, I'm not sure if either option solves the deadlock concern completely.
> If application takes an ACCESS EXCLUSIVE lock by LOCK command (not by TRUNCATE !)
> on the user_catalog_table in a transaction, and if the plugin tries to take a lock on it,
> I think the deadlock happens. Of course, having a consensus that the plugin takes no lock at all
> would remove this concern, though.
>

This is true for system catalogs as well. See the similar report [1]

> Like this, I'd like to discuss those two items in question together at first.
> * the plugin should take a lock on user_catalog_table or not
> * the range of decoding related to user_catalog_table
>
> To me, taking no lock on the user_catalog_table from plugin is fine
>

We allow taking locks on system catalogs, so why prohibit
user_catalog_tables? However, I agree that if we want plugins to
acquire the lock on user_catalog_tables then we should either prohibit
decoding of such relations or do something else to avoid deadlock
hazards.

[1] - https://www.postgresql.org/message-id/CALDaNm1UB%3D%3DgL9Poad4ETjfcyGdJBphWEzEZocodnBd--kJpVw%40mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2021-05-17 09:51:05 Re: Winflex docs and distro
Previous Message tanghy.fnst@fujitsu.com 2021-05-17 09:36:33 RE: "ERROR: deadlock detected" when replicating TRUNCATE