Re: speed up a logical replica setup

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Euler Taveira <euler(at)eulerto(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Subject: Re: speed up a logical replica setup
Date: 2024-07-18 02:37:44
Message-ID: CAA4eK1LSqM3tF_TgEr-PsztGL6hLJTcojwBTycTnpa7ucjfKjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 17, 2024 at 5:28 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> > Your analysis sounds correct to me.
>
> Okay, so we could have a same picture...
>
> > > IIUC, the root cause is that pg_create_logical_replication_slot() returns a LSN
> > > which is not generated yet. So, I think both mine [1] and Euler's approach [2]
> > > can solve the issue. My proposal was to add an extra WAL record after the final
> > > slot creation, and Euler's one was to use a restart_lsn as the
> > recovery_target_lsn.
> > >
> >
> > I don't think it is correct to set restart_lsn as consistent_lsn point
> > because the same is used to set replication origin progress. Later
> > when we start the subscriber, the system will use that LSN as a
> > start_decoding_at point which is the point after which all the commits
> > will be replicated. So, we will end up incorrectly using restart_lsn
> > (LSN from where we start reading the WAL) as start_decoding_at point.
> > How could that be correct?
>
> I didn't say we could use restart_lsn as consistent point of logical replication,
> but I could agree the approach has issues.
>
> > Now, even if we use restart_lsn as recovery_target_lsn and the LSN
> > returned by pg_create_logical_replication_slot() as consistent LSN to
> > set replication progress, that also could lead to data loss because
> > the subscriber may never get data between restart_lsn value and
> > consistent LSN value.
>
> You considered the case, e.g., tuples were inserted just after the restart_lsn
> but before the RUNNING_XACT record?

I am thinking of transactions between restart_lsn and "consistent
point lsn" (aka the point after which all commits will be replicated).
You conclusion seems correct to me that such transactions won't be
replicated by streaming replication and would be skipped by logical
replication. Now, if we can avoid that anyway, we can consider that.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2024-07-18 03:08:48 Re: Redundant code in create_gather_merge_path
Previous Message Zhijie Hou (Fujitsu) 2024-07-18 02:22:16 RE: Conflict detection and logging in logical replication