RE: Conflict detection for update_deleted in logical replication

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: RE: Conflict detection for update_deleted in logical replication
Date: 2024-10-25 06:17:06
Message-ID: TYAPR01MB5692541820BCC365C69442FFF54F2@TYAPR01MB5692.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Hou,

> Here is the V5 patch set which addressed above comments.

Thanks for updating the patch! While reviewing yours, I found a corner case that
a recently deleted tuple cannot be detected when index scan is chosen.
This can happen when indices are re-built during the replication.
Unfortunately, I don't have any solutions for it.

Found issue
========

When indices are built with the CONCURRENTLY option, a standard MVCC snapshot
is used to list up the tuples of the table, which means the new index ignores
recently deleted tuples.

This can cause the update_deleted to be wrongly detected as update_missing.
Assuming that we have a bidirectional cluster like case 1 [1] (id is a primary key),
and REINDEX CONCURRENTLY happens after executing DELETE but before receiving the UPDATE.
A primary key of t will be re-built by the REINDEX command but the dead tuples by
DELETE will be missed because it is invisible from the transaction.
Then, the apply worker receives the UPDATE and recognizes the target tuple is removed.
It scans with snapshotany to find the deleted tuple via the index, but it fails.
This event is reported as update_missing.

Reproduce steps
===========

This can be reproduced with v5 patch set:

1. constructed a 2-way replication by running attached.
2. stopped an apply worker on node2
3. executed `UPDATE tab SET a = 2;` on node1.
4. executed `DELETE FROM tab;` on node 2
5. executed `REINDEX TABLE CONCURRENTLY tab;` on node2
6. resumed the stopped apply worker
7. the worker should detect update_deleted, but it detected update_missing

```
LOG: conflict detected on relation "public.tab": conflict=update_missing
DETAIL: Could not find the row to be updated.
Remote tuple (2); replica identity (a)=(1).
``

[1]: https://www.postgresql.org/message-id/OS0PR01MB5716A78EE3D6F2F4754E1209946C2%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
test_1025.sh application/octet-stream 1.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-10-25 06:23:00 Re: altering a column's collation leaves an invalid foreign key
Previous Message Tatsuo Ishii 2024-10-25 06:15:31 Re: pgbench: Improve result outputs related to failed transactinos