From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Subject: | RE: Conflict detection for update_deleted in logical replication |
Date: | 2024-10-25 06:17:06 |
Message-ID: | TYAPR01MB5692541820BCC365C69442FFF54F2@TYAPR01MB5692.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear Hou,
> Here is the V5 patch set which addressed above comments.
Thanks for updating the patch! While reviewing yours, I found a corner case that
a recently deleted tuple cannot be detected when index scan is chosen.
This can happen when indices are re-built during the replication.
Unfortunately, I don't have any solutions for it.
Found issue
========
When indices are built with the CONCURRENTLY option, a standard MVCC snapshot
is used to list up the tuples of the table, which means the new index ignores
recently deleted tuples.
This can cause the update_deleted to be wrongly detected as update_missing.
Assuming that we have a bidirectional cluster like case 1 [1] (id is a primary key),
and REINDEX CONCURRENTLY happens after executing DELETE but before receiving the UPDATE.
A primary key of t will be re-built by the REINDEX command but the dead tuples by
DELETE will be missed because it is invisible from the transaction.
Then, the apply worker receives the UPDATE and recognizes the target tuple is removed.
It scans with snapshotany to find the deleted tuple via the index, but it fails.
This event is reported as update_missing.
Reproduce steps
===========
This can be reproduced with v5 patch set:
1. constructed a 2-way replication by running attached.
2. stopped an apply worker on node2
3. executed `UPDATE tab SET a = 2;` on node1.
4. executed `DELETE FROM tab;` on node 2
5. executed `REINDEX TABLE CONCURRENTLY tab;` on node2
6. resumed the stopped apply worker
7. the worker should detect update_deleted, but it detected update_missing
```
LOG: conflict detected on relation "public.tab": conflict=update_missing
DETAIL: Could not find the row to be updated.
Remote tuple (2); replica identity (a)=(1).
``
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Attachment | Content-Type | Size |
---|---|---|
test_1025.sh | application/octet-stream | 1.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | jian he | 2024-10-25 06:23:00 | Re: altering a column's collation leaves an invalid foreign key |
Previous Message | Tatsuo Ishii | 2024-10-25 06:15:31 | Re: pgbench: Improve result outputs related to failed transactinos |