RE: Found issues related with logical replication and 2PC

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: RE: Found issues related with logical replication and 2PC
Date: 2024-08-09 05:03:55
Message-ID: TYAPR01MB56927A6DBB25C9C79B16D9C4F5BA2@TYAPR01MB5692.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Amit,

>
> The code changes look mostly good to me. I have changed/added a few
> comments in the attached modified version.
>

Thanks for updating the patch! It LGTM. I've tested your patch and confirmed
it did not cause the data loss. I used the source which was applied v3 and additional
fix to visualize the replication command [1].

Method
======

1. Construct a logical replication system with two_phase = true and
synchronous_commit = false
2. attach a walwriter of the subscriber to stop the process
3. Start a transaction and prepare it for the publisher.
4. Wait until the worker replies to the publisher.
5. Stop the subscriber
6. Restart subscriber.
7. Do COMMIT PREPARED

Attached script can construct the same situation.

Result
======

After the step 5, I ran pg_waldump and confirmed PREPARE record existed on
the subscriber.

```
$ pg_waldump data_sub/pg_wal/000000010000000000000001
...
rmgr: Transaction len..., desc: PREPARE gid pg_gid_16389_741: ...
rmgr: XLOG len..., desc: CHECKPOINT_SHUTDOWN ...
```

Also, after the step 7, I confirmed that only the COMMIT PREPARED record
was sent because log output the below line. "75" means the ASCII character 'K';
this indicated that the replication message corresponded to COMMIT PREPARED.
```
LOG: XXX got message 75
```

Additionally, I did another test, which is basically same as above but 1) XLogFlush()
in EndPrepare() was commented out and 2) kill -9 was used at step 5 to emulate a
crash. Since the PREPAREd transaction cannot survive on the subscriber in this case,
so COMMIT PREPARED command on publisher causes an ERROR on the subscriber.
```
ERROR: prepared transaction with identifier "pg_gid_16389_741" does not exist
CONTEXT: processing remote data for replication origin "pg_16389" during message
type "COMMIT PREPARED" in transaction 741, finished at 0/15463C0
```
I think this shows that the backend process can ensure the WAL is persisted so data loss
won't occur.

[1]:
```
@@ -3297,6 +3297,8 @@ apply_dispatch(StringInfo s)
saved_command = apply_error_callback_arg.command;
apply_error_callback_arg.command = action;

+ elog(LOG, "XXX got message %d", action);
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
test_0809.sh application/octet-stream 1.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-08-09 05:12:58 Re: Found issues related with logical replication and 2PC
Previous Message Tom Lane 2024-08-09 04:55:46 Re: Don't overwrite scan key in systable_beginscan()