RE: Conflict detection for update_deleted in logical replication

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Conflict detection for update_deleted in logical replication
Date: 2025-01-07 11:22:04
Message-ID: OSCPR01MB14966F6B816880165E387758AF5112@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Amit, Hou,

> BTW, it is not clear how retaining dead tuples will help the detection
> update_origin_differs. Will it happen when the tuple is inserted or
> updated on the subscriber and then when we try to update the same
> tuple due to remote update, the commit_ts information of the xact is
> not available because the same is already removed by vacuum? This
> should happen for the update case for the new row generated by the
> update operation as that will be used in comparison. Can you please
> show it be a test case even if it is manual?

I've confirmed that retaining dead tuples is helpful for the origin_differ detection.
I considered a workload to prove it.

## Workload

I ran below steps:

1. Setup a publisher. There was a table and it had a tuple.
2. Setup a subscriber. There was a table and the same tuple was inserted by itself (not replicated).
The subscription option was "copy_data = off", and GUC setting was "track_commit_timestamp=on".
3. Installed "xid_wraparound" extension on the subscriber.
4. Called `SELECT consume_xids(60000000);` on the subscriber side to advance xid.
5. Ran VACUUM FREEZE for all the databases on the subscriber side.
6. Updated the tuple on the publisher side.

The key idea is that commit_ts entries could be removed when the given
transaction ID is frozen. In this workload, 60 millon transaction IDs are consumed
and this is bit larger than vacuum_freeze_min_age.
I.e., step 5 can freeze a tuple which was inserted to the subscriber at step 2.
Then, we run VACUUM FREEZE command for databases to advance pg_database.datfrozenxid.
This command can teach commit_ts module that entries for old transactions can be truncated.

Attached script automate the test.

## Result

When detect_update_deleted of the subscription was set to false,
update_origin_differ was not detected on the subscriber.
In contrast, when detect_update_deleted was true, it was detectable:

```
LOG: conflict detected on relation "public.foo": conflict=update_origin_differs
DETAIL: Updating the row that was modified locally in transaction 745 at ...
Existing local tuple (1, 1); remote tuple (1, 2); replica identity (a)=(1).
```

Based on the fact, we can conclude that the option is also helpful for detecting
origin_differ conflicts.

> Can't it happen for delete_origin_differs as well for the same reason?

Right. I've also tested with almost the same way as above, and I got the same result.
You can confirm by modifying the last statement in the attached script.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
test_origin_differ.sh application/octet-stream 1.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2025-01-07 11:56:03 Re: Further _bt_first simplifications for parallel index scans
Previous Message vignesh C 2025-01-07 11:11:14 Re: Conflict detection for update_deleted in logical replication