From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Callahan, Drew" <callaan(at)amazon(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | RE: Potential data loss due to race condition during logical replication slot creation |
Date: | 2024-03-27 10:37:02 |
Message-ID: | TYCPR01MB12077A67B15F682BC4DC835E4F5342@TYCPR01MB12077.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Dear Sawada-san,
>
> With the PoC patch, we check ondisk.builder.is_there_running_xact in
> SnapBuildRestore(),
Yes, the PoC requires that the state of snapshot in the file must be read.
> but can we just check running->xcnt in
> SnapBuildFindSnapshot() to skip calling SnapBuildRestore()? That is,
> if builder->initial_xmin_horizon is valid (or
> builder->finding_start_point is true) and running->xcnt > 0, we skip
> the snapshot restore.
IIUC, it does not require modifications of API. It may be an advantage.
> However, I think there are still cases where we
> unnecessarily skip snapshot restores
>
> Probably, what we would like to avoid is, we compute
> initial_xmin_horizon and start to find the initial start point while
> there is a concurrently running transaction, and then jump to the
> consistent state by restoring the consistent snapshot before the
> concurrent transaction commits.
Yeah, information before concurrent txns are committed should not be used. I think
that's why SnapBuildWaitSnapshot() waits until listed transactions are finished.
> So we can ignore snapshot restores if
> (oldest XID among transactions running at the time of
> CreateInitDecodingContext()) >= (OldestRunningXID in
> xl_running_xacts).
>
> I've drafted this idea in the attached patch just for discussion.
Thanks for sharing the patch. At least I confirmed all tests and workload you
pointed out in [1] were passed. I will post here if I found other issues.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2024-03-27 12:42:33 | BUG #18410: SQL Error [XX000]: ERROR: variable not found in subplan target list |
Previous Message | Daniel Gustafsson | 2024-03-27 08:58:49 | Re: BUG #18409: After my windows update, I can not run postgre 16 server |