From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Euler Taveira <euler(at)eulerto(dot)com> |
Cc: | Alexander Lakhin <exclusion(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: speed up a logical replica setup |
Date: | 2024-07-15 06:46:57 |
Message-ID: | CAA4eK1KDon8qYLQRDmnqJq5LdD3dgZjX_sOq1ekBNav4dkqChQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jul 12, 2024 at 4:54 AM Euler Taveira <euler(at)eulerto(dot)com> wrote:
>
> On Thu, Jul 11, 2024, at 2:00 PM, Alexander Lakhin wrote:
>
> May I ask you to look at another failure of the test occurred today [1]?
>
>
> Thanks for the report!
>
> You are observing the same issue that Amit explained in [1]. The
> pg_create_logical_replication_slot returns the EndRecPtr (see
> slot->data.confirmed_flush in DecodingContextFindStartpoint()). EndRecPtr points
> to the next record and it is a future position for an idle server. That's why
> the recovery takes some time to finish because it is waiting for an activity to
> increase the LSN position. Since you modified LOG_SNAPSHOT_INTERVAL_MS to create
> additional WAL records soon, the EndRecPtr position is reached rapidly and the
> recovery ends quickly.
>
If the recovery ends quickly (which is expected due to reduced
LOG_SNAPSHOT_INTERVAL_MS ) then why do we see "error: recovery timed
out"?
> Hayato proposes a patch [2] to create an additional WAL record that has the same
> effect from you little hack: increase the LSN position to allow the recovery
> finishes soon. I don't like the solution although it seems simple to implement.
> As Amit said if we know the ReadRecPtr, we could use it as consistent LSN. The
> problem is that it is used by logical decoding but it is not exposed. [reading
> the code...] When the logical replication slot is created, restart_lsn points to
> the lastReplayedEndRecPtr (see ReplicationSlotReserveWal()) that is the last
> record replayed.
>
The last 'lastReplayedEndRecPtr' should be the value of restart_lsn on
standby (when RecoveryInProgress is true) but here we are creating
slots on the publisher/primary, so shouldn't restart_lsn point to
"latest WAL insert pointer"?
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | jian he | 2024-07-15 07:35:06 | Re: Re: Removing unneeded self joins |
Previous Message | jian he | 2024-07-15 06:35:36 | Re: documentation structure |