RE: Re:Re: BUG #18267: Logical replication bug: data is not synchronized after Alter Publication.

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'tys' <sytoptimisprime(at)163(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: RE: Re:Re: BUG #18267: Logical replication bug: data is not synchronized after Alter Publication.
Date: 2024-01-15 12:58:08
Message-ID: TY3PR01MB98897597238BA6283A2A5B64F56C2@TY3PR01MB9889.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Dear Song,

>
I see that snapshot changes is added to the tail of txn->changes list in
SnapBuildDistributeNewCatalogSnapshot. When the DML transaction is
committed, the corresponding entry in RelationSyncCache has already been
invalidated. However, snapshot hasn't been changed as this time (The newly added
snapshot change is still in the list and hasn't been invoked.), so as you say: "
pgoutput_change continues to see 'relentry->pubactions.pubinsert' as false,
even after re fetching the relation entry after the invalidation."

Therefore, there is one solution, as Andres Freund[1] states, that if a transaction is
found to have modified catalog during logical decoding, invalidations are forced to
be assigned to all concurrent in-progress transactions, just as my patch did.

[1] https://www.postgresql.org/message-id/20231118025445.crhaeeuvoe2g5dv6%40awork3.anarazel.de
>

Thanks for making a patch, but your patch cannot be built:

```
reorderbuffer.c: In function ‘ReorderBufferDistributeInvalidation’:
../../../../src/include/lib/ilist.h:594:2: error: expected identifier before ‘(’ token
(AssertVariableIsOfTypeMacro(ptr, dlist_node *), \
^
reorderbuffer.c:2648:8: note: in expansion of macro ‘dlist_container’
txn->dlist_container(ReorderBufferTXN, node, txn_i.cur);
^~~~~~~~~~~~~~~
```

Based on your arguments, I revised your patch. Is it same as your expectations?
This also fixes some coding conventions.

>
Although my approach seems to be effective,
>

I ran my reproducer and confirmed that the issue was fixed.

>
it may bring additional overhead to logical decoding
under special circumstances. I think upgrading lock mentioned in [1] is also an effective solution,
and it seems to have less impact. But I don't quite understand why it may cause deadlocks mentioned
in [1].
>

IIUC, no one pointed out a concrete case for deadlocks, but anyone could not say
that it is safe, neither. We must understand the thread more and must decide how
we fix: modify the lock level or send invalidation messages to all the reorderbuffer
transactions. How do you think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
v2-0001-fix-logical-replication-data-sync-bug.patch application/octet-stream 3.2 KB

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Devrim Gündüz 2024-01-15 15:23:21 Re: BUG #18289: postgresql14-devel-14.10-2PGDG.rhel8.x86_64.rpm Contains invalid cLang option in Makefile.global
Previous Message Devrim Gündüz 2024-01-15 12:06:09 Re: BUG #18293: Rocky 8 postgres install failing as nothing provides libarmadillo.so.10()(64bit)