From: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Subject: | RE: Synchronizing slots from primary to standby |
Date: | 2024-04-08 13:31:22 |
Message-ID: | OS0PR01MB571665359F2F5DCD3ADABC9F94002@OS0PR01MB5716.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Monday, April 8, 2024 6:32 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Apr 8, 2024 at 12:19 PM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com>
> wrote:
> >
> > On Saturday, April 6, 2024 12:43 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
> > > <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > >
> > > Yeah, that could be the first step. We can probably add an injection
> > > point to control the bgwrite behavior and then add tests involving
> > > walsender performing the decoding. But I think it is important to
> > > have sufficient tests in this area as I see they are quite helpful in uncovering
> the issues.
> >
> > Here is the patch to drop the subscription in the beginning so that
> > the restart_lsn of the lsub1_slot won't be advanced due to concurrent
> > xl_running_xacts from bgwriter. The subscription will be re-created
> > after all the slots are sync-ready. I think maybe we can use this to
> > stabilize the test as a first step and then think about how to make
> > use of injection point to add more tests if it's worth it.
> >
>
> Pushed.
Thanks for pushing.
I checked the BF status, and noticed one BF failure, which I think is related to
a miss in the test code.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-04-08%2012%3A04%3A27
From the following log, I can see the sync failed because the standby is
lagging behind of the failover slot.
-----
# No postmaster PID for node "cascading_standby"
error running SQL: 'psql:<stdin>:1: ERROR: skipping slot synchronization as the received slot sync LSN 0/4000148 for slot "snap_test_slot" is ahead of the standby position 0/4000114'
while running 'psql -XAtq -d port=50074 host=/tmp/t4HQFlrDmI dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'SELECT pg_sync_replication_slots();' at /home/bf/bf-build/adder/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm line 2042.
# Postmaster PID for node "publisher" is 3715298
-----
I think it's because we missed to call wait_for_replay_catchup before syncing
slots.
-----
$primary->safe_psql('postgres',
"SELECT pg_create_logical_replication_slot('snap_test_slot', 'test_decoding', false, false, true);"
);
# ? missed to wait here
$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
-----
While testing, I noticed another place where we were calling
wait_for_replay_catchup before doing pg_replication_slot_advance, which also has
a small possibility to cause the failover slot to be ahead of the standby if
some logs are written in between these two steps. So, I adjusted them together.
Here is a small patch to improve the test.
Best Regards,
Hou zj
Attachment | Content-Type | Size |
---|---|---|
0001-Ensure-the-standby-is-not-lagging-behind-the-failove.patch | application/octet-stream | 1.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-04-08 13:43:11 | Re: PostgreSQL 17 Release Management Team & Feature Freeze |
Previous Message | Andrew Dunstan | 2024-04-08 13:29:57 | Re: WIP Incremental JSON Parser |