RE: Potential data loss due to race condition during logical replication slot creation

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Callahan, Drew" <callaan(at)amazon(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: RE: Potential data loss due to race condition during logical replication slot creation
Date: 2024-03-19 02:16:35
Message-ID: TYCPR01MB120774D17B90FFDAC087B959FF52C2@TYCPR01MB12077.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Dear Amit,

> I feel setting "needs_full_snapshot" to true for decoding means the
> snapshot will start tracking non-catalog committed xacts as well which
> is costly.

I think the approach was most conservative one which does not have to change
the version of the snapshot. However, I understood that you wanted to consider
the optimized solution for HEAD first.

> See SnapBuildCommitTxn(). Can we avoid this problem if we
> would have list of all running xacts when we serialize the snapshot by
> not decoding any xact whose xid lies in that list? If so, one idea to
> achieve could be that we maintain the highest_running_xid while
> serailizing the snapshot and then during restore if that
> highest_running_xid is <= builder->initial_xmin_horizon, then we
> ignore restoring the snapshot. We already have few such cases handled
> in SnapBuildRestore().

Based on the idea, I made a prototype. It can pass tests added by others and me.
How do other think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Attachment Content-Type Size
0001-Serialize-running-xacts.patch application/octet-stream 13.3 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Kapila 2024-03-19 02:33:06 Re: Potential data loss due to race condition during logical replication slot creation
Previous Message PG Bug reporting form 2024-03-18 23:23:44 BUG #18399: Query plan optimization results in runtime error when hoisting cast from inside subquery