Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-04-09 05:13:18
Message-ID: CAA4eK1JV8g4Cm78TfMHDwWaeOo9Ogw5YR+_ZCcaOgKuydcBL8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 8, 2024 at 7:01 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> Thanks for pushing.
>
> I checked the BF status, and noticed one BF failure, which I think is related to
> a miss in the test code.
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-04-08%2012%3A04%3A27
>
> From the following log, I can see the sync failed because the standby is
> lagging behind of the failover slot.
>
> -----
> # No postmaster PID for node "cascading_standby"
> error running SQL: 'psql:<stdin>:1: ERROR: skipping slot synchronization as the received slot sync LSN 0/4000148 for slot "snap_test_slot" is ahead of the standby position 0/4000114'
> while running 'psql -XAtq -d port=50074 host=/tmp/t4HQFlrDmI dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'SELECT pg_sync_replication_slots();' at /home/bf/bf-build/adder/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm line 2042.
> # Postmaster PID for node "publisher" is 3715298
> -----
>
> I think it's because we missed to call wait_for_replay_catchup before syncing
> slots.
>
> -----
> $primary->safe_psql('postgres',
> "SELECT pg_create_logical_replication_slot('snap_test_slot', 'test_decoding', false, false, true);"
> );
> # ? missed to wait here
> $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
> -----
>
> While testing, I noticed another place where we were calling
> wait_for_replay_catchup before doing pg_replication_slot_advance, which also has
> a small possibility to cause the failover slot to be ahead of the standby if
> some logs are written in between these two steps. So, I adjusted them together.
>
> Here is a small patch to improve the test.
>

LGTM. I'll push this tomorrow morning unless there are any more
comments or suggestions.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-04-09 05:13:52 Re: Speed up clean meson builds by ~25%
Previous Message Heikki Linnakangas 2024-04-09 05:04:49 Re: libpq.sgml: "server ejectes GSS" -> server rejects GSS