Re: Potential data loss due to race condition during logical replication slot creation

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Callahan, Drew" <callaan(at)amazon(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Potential data loss due to race condition during logical replication slot creation
Date: 2024-03-18 09:08:06
Message-ID: CAA4eK1KyUUSO9gYKukcwqcgrFWtyuZsfkspjMriKv4uQ_9WZRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Mar 13, 2024 at 3:17 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> > Attached patch implemented the approach a) since no one made. I also added
> > the test which can do assertion failure, but not sure it should be included.
>

I feel setting "needs_full_snapshot" to true for decoding means the
snapshot will start tracking non-catalog committed xacts as well which
is costly. See SnapBuildCommitTxn(). Can we avoid this problem if we
would have list of all running xacts when we serialize the snapshot by
not decoding any xact whose xid lies in that list? If so, one idea to
achieve could be that we maintain the highest_running_xid while
serailizing the snapshot and then during restore if that
highest_running_xid is <= builder->initial_xmin_horizon, then we
ignore restoring the snapshot. We already have few such cases handled
in SnapBuildRestore().

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Kapila 2024-03-18 09:18:56 Re: Potential data loss due to race condition during logical replication slot creation
Previous Message Hayato Kuroda (Fujitsu) 2024-03-18 06:44:46 RE: Re:RE: Re:BUG #18369: logical decoding core on AssertTXNLsnOrder()