From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | "Callahan, Drew" <callaan(at)amazon(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | RE: Potential data loss due to race condition during logical replication slot creation |
Date: | 2024-03-27 07:54:12 |
Message-ID: | TYCPR01MB12077ADB13450B422B6161A01F5342@TYCPR01MB12077.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Dear Sawada-san,
Thanks for giving comments and sorry for late reply.
> > If so, one idea to
> > achieve could be that we maintain the highest_running_xid while
> > serailizing the snapshot and then during restore if that
> > highest_running_xid is <= builder->initial_xmin_horizon, then we
> > ignore restoring the snapshot. We already have few such cases handled
> > in SnapBuildRestore().
>
> I think that builder->initial_xmin_horizon could be older than
> highest_running_xid, for example, when there is a logical replication
> slot whose catalog_xmin is old. However, even in this case, we might
> need to ignore restoring the snapshot. For example, a slightly
> modified test case still can cause the same problem.
>
> The test case in the Kuroda-san's v2 patch:
> permutation "s0_init" "s0_begin" "s0_insert1" "s1_init"
> "s2_checkpoint" "s2_get_changes_slot0" "s0_insert2" "s0_commit"
> "s1_get_changes_slot0"\ "s1_get_changes_slot1"
>
> Modified-version test case (add "s0_insert1" between "s0_init" and "s0_begin"):
> permutation "s0_init" "s0_insert1" "s0_begin" "s0_insert1" "s1_init"
> "s2_checkpoint" "s2_get_changes_slot0" "s0_insert2" "s0_commit"
> "s1_get_changes_slot0\ " "s1_get_changes_slot1"
Good catch, I confirmed that the variant still partially output transactions:
```
step s1_get_changes_slot1: SELECT data FROM pg_logical_slot_get_changes('slot1', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
data
-----------------------------------------
BEGIN
table public.tbl: INSERT: val1[integer]:2
COMMIT
(3 rows)
```
/////
I analyzed a bit. In below descriptions, I uses these notations:
txn0 - has a single insert
txn1 - has begin-insert-commit operations.
slot1 is created in parallel
While creating slot0, initial_xmin_horizon was also set, and it come from
GetOldestSafeDecodingTransactionId().
Since there were no slots in the database before, the value was just a next xid.
Then, the pinned xid was assigned to the txn0.
It is correct behavior because slot0 must decode txn0.
Finally, while creating slot1 after the txn1, the snapshot serialized by slot0 was
restored once. txn1 was the recorded as the highest_running, but initial_xmin was
older than it. So the snapshot was restored and txn1 was partially decoded.
initial_xmin_horizon (txn0) < highest_running_xid (txn1)
/////
So, how should we fix? One idea is based on (c), and adds a flag which indicates
an existence of concurrent transactions. The restoration is skipped if both two
flags are true. PSA the PoC patch.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/
Attachment | Content-Type | Size |
---|---|---|
0001-Serialize-a-flag-and-use-finding_start_point.patch | application/octet-stream | 15.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2024-03-27 08:25:39 | BUG #18409: After my windows update, I can not run postgre 16 server |
Previous Message | Andrew Dunstan | 2024-03-27 07:42:21 | Re: BUG #18379: LDAP bind password exposed |