Re: speed up a logical replica setup

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Euler Taveira <euler(at)eulerto(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Subject: Re: speed up a logical replica setup
Date: 2024-07-17 10:10:08
Message-ID: CAA4eK1J5t48z2mfpNdCT8OunECR7qJZmHUiDnrdunU+z83u_oA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 17, 2024 at 1:23 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> I also analyzed this failure, let me share it. Here, I think events in below
> ordering were occurred.
>
> 1. Backend created a publication on $db2,
> 2. BGWriter generated RUNNING_XACT record, then
> 3. Backend created a replication slot on $db2.
>
> In this case, the recovery_target_lsn is ahead of the RUNNING_XACT record generated
> at step 3. Also, since both bgwriter and slot creation mark the record as
> *UNIMPORTANT* one, the writer won't start again even after the
> LOG_SNAPSHOT_INTERVAL_MS. The rule is written in BackgroundWriterMain():
>
> ```
> /*
> * Only log if enough time has passed and interesting records have
> * been inserted since the last snapshot. Have to compare with <=
> * instead of < because GetLastImportantRecPtr() points at the
> * start of a record, whereas last_snapshot_lsn points just past
> * the end of the record.
> */
> if (now >= timeout &&
> last_snapshot_lsn <= GetLastImportantRecPtr())
> {
> last_snapshot_lsn = LogStandbySnapshot();
> last_snapshot_ts = now;
> }
> ```
>
> Therefore, pg_createsubscriber waited until a new record was replicated, but no
> activities were recorded, causing a timeout. Since this is a timing issue, Alexander
> could reproduce the failure with shorter time duration and parallel running.
>

Your analysis sounds correct to me.

> IIUC, the root cause is that pg_create_logical_replication_slot() returns a LSN
> which is not generated yet. So, I think both mine [1] and Euler's approach [2]
> can solve the issue. My proposal was to add an extra WAL record after the final
> slot creation, and Euler's one was to use a restart_lsn as the recovery_target_lsn.
>

I don't think it is correct to set restart_lsn as consistent_lsn point
because the same is used to set replication origin progress. Later
when we start the subscriber, the system will use that LSN as a
start_decoding_at point which is the point after which all the commits
will be replicated. So, we will end up incorrectly using restart_lsn
(LSN from where we start reading the WAL) as start_decoding_at point.
How could that be correct?

Now, even if we use restart_lsn as recovery_target_lsn and the LSN
returned by pg_create_logical_replication_slot() as consistent LSN to
set replication progress, that also could lead to data loss because
the subscriber may never get data between restart_lsn value and
consistent LSN value.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2024-07-17 10:13:04 Re: PG_TEST_EXTRA and meson
Previous Message Matthias van de Meent 2024-07-17 09:33:05 Re: Expand applicability of aggregate's sortop optimization