From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Callahan, Drew" <callaan(at)amazon(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Potential data loss due to race condition during logical replication slot creation |
Date: | 2024-03-27 08:26:05 |
Message-ID: | CAD21AoAHWJf10+jRnALY_=nyByE+BgDVV0gnXuiRhxAne+JJdA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Wed, Mar 27, 2024 at 4:54 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Sawada-san,
>
> Thanks for giving comments and sorry for late reply.
>
> > > If so, one idea to
> > > achieve could be that we maintain the highest_running_xid while
> > > serailizing the snapshot and then during restore if that
> > > highest_running_xid is <= builder->initial_xmin_horizon, then we
> > > ignore restoring the snapshot. We already have few such cases handled
> > > in SnapBuildRestore().
> >
> > I think that builder->initial_xmin_horizon could be older than
> > highest_running_xid, for example, when there is a logical replication
> > slot whose catalog_xmin is old. However, even in this case, we might
> > need to ignore restoring the snapshot. For example, a slightly
> > modified test case still can cause the same problem.
> >
> > The test case in the Kuroda-san's v2 patch:
> > permutation "s0_init" "s0_begin" "s0_insert1" "s1_init"
> > "s2_checkpoint" "s2_get_changes_slot0" "s0_insert2" "s0_commit"
> > "s1_get_changes_slot0"\ "s1_get_changes_slot1"
> >
> > Modified-version test case (add "s0_insert1" between "s0_init" and "s0_begin"):
> > permutation "s0_init" "s0_insert1" "s0_begin" "s0_insert1" "s1_init"
> > "s2_checkpoint" "s2_get_changes_slot0" "s0_insert2" "s0_commit"
> > "s1_get_changes_slot0\ " "s1_get_changes_slot1"
>
> Good catch, I confirmed that the variant still partially output transactions:
>
> ```
> step s1_get_changes_slot1: SELECT data FROM pg_logical_slot_get_changes('slot1', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
> data
> -----------------------------------------
> BEGIN
> table public.tbl: INSERT: val1[integer]:2
> COMMIT
> (3 rows)
> ```
>
> /////
>
> I analyzed a bit. In below descriptions, I uses these notations:
>
> txn0 - has a single insert
> txn1 - has begin-insert-commit operations.
> slot1 is created in parallel
>
> While creating slot0, initial_xmin_horizon was also set, and it come from
> GetOldestSafeDecodingTransactionId().
> Since there were no slots in the database before, the value was just a next xid.
>
> Then, the pinned xid was assigned to the txn0.
> It is correct behavior because slot0 must decode txn0.
>
> Finally, while creating slot1 after the txn1, the snapshot serialized by slot0 was
> restored once. txn1 was the recorded as the highest_running, but initial_xmin was
> older than it. So the snapshot was restored and txn1 was partially decoded.
>
> initial_xmin_horizon (txn0) < highest_running_xid (txn1)
>
> /////
>
> So, how should we fix? One idea is based on (c), and adds a flag which indicates
> an existence of concurrent transactions. The restoration is skipped if both two
> flags are true. PSA the PoC patch.
>
With the PoC patch, we check ondisk.builder.is_there_running_xact in
SnapBuildRestore(), but can we just check running->xcnt in
SnapBuildFindSnapshot() to skip calling SnapBuildRestore()? That is,
if builder->initial_xmin_horizon is valid (or
builder->finding_start_point is true) and running->xcnt > 0, we skip
the snapshot restore. However, I think there are still cases where we
unnecessarily skip snapshot restores.
Probably, what we would like to avoid is, we compute
initial_xmin_horizon and start to find the initial start point while
there is a concurrently running transaction, and then jump to the
consistent state by restoring the consistent snapshot before the
concurrent transaction commits. So we can ignore snapshot restores if
(oldest XID among transactions running at the time of
CreateInitDecodingContext()) >= (OldestRunningXID in
xl_running_xacts).
I've drafted this idea in the attached patch just for discussion.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
0001-draft-fix.patch | application/octet-stream | 17.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2024-03-27 08:58:49 | Re: BUG #18409: After my windows update, I can not run postgre 16 server |
Previous Message | PG Bug reporting form | 2024-03-27 08:25:39 | BUG #18409: After my windows update, I can not run postgre 16 server |