Re: speed up a logical replica setup

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Euler Taveira <euler(at)eulerto(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: speed up a logical replica setup
Date: 2024-07-15 06:46:57
Message-ID: CAA4eK1KDon8qYLQRDmnqJq5LdD3dgZjX_sOq1ekBNav4dkqChQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 12, 2024 at 4:54 AM Euler Taveira <euler(at)eulerto(dot)com> wrote:
>
> On Thu, Jul 11, 2024, at 2:00 PM, Alexander Lakhin wrote:
>
> May I ask you to look at another failure of the test occurred today [1]?
>
>
> Thanks for the report!
>
> You are observing the same issue that Amit explained in [1]. The
> pg_create_logical_replication_slot returns the EndRecPtr (see
> slot->data.confirmed_flush in DecodingContextFindStartpoint()). EndRecPtr points
> to the next record and it is a future position for an idle server. That's why
> the recovery takes some time to finish because it is waiting for an activity to
> increase the LSN position. Since you modified LOG_SNAPSHOT_INTERVAL_MS to create
> additional WAL records soon, the EndRecPtr position is reached rapidly and the
> recovery ends quickly.
>

If the recovery ends quickly (which is expected due to reduced
LOG_SNAPSHOT_INTERVAL_MS ) then why do we see "error: recovery timed
out"?

> Hayato proposes a patch [2] to create an additional WAL record that has the same
> effect from you little hack: increase the LSN position to allow the recovery
> finishes soon. I don't like the solution although it seems simple to implement.
> As Amit said if we know the ReadRecPtr, we could use it as consistent LSN. The
> problem is that it is used by logical decoding but it is not exposed. [reading
> the code...] When the logical replication slot is created, restart_lsn points to
> the lastReplayedEndRecPtr (see ReplicationSlotReserveWal()) that is the last
> record replayed.
>

The last 'lastReplayedEndRecPtr' should be the value of restart_lsn on
standby (when RecoveryInProgress is true) but here we are creating
slots on the publisher/primary, so shouldn't restart_lsn point to
"latest WAL insert pointer"?

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-07-15 07:35:06 Re: Re: Removing unneeded self joins
Previous Message jian he 2024-07-15 06:35:36 Re: documentation structure