Re: [HACKERS] make async slave to wait for lsn to be replayed

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Kevin Hale Boyes <kcboyes(at)gmail(dot)com>, Kartyshov Ivan <i(dot)kartyshov(at)postgrespro(dot)ru>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, hlinnaka(at)iki(dot)fi, alvherre(at)alvh(dot)no-ip(dot)org, pashkin(dot)elfe(at)gmail(dot)com, bharath(dot)rupireddyforpostgres(at)gmail(dot)com, euler(at)eulerto(dot)com, thomas(dot)munro(at)gmail(dot)com, peter(at)eisentraut(dot)org, amit(dot)kapila16(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, smithpb2250(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed
Date: 2024-08-06 10:23:35
Message-ID: CAPpHfduYkve0sw-qy4aCCmJv_MXfuuAQ7wyRQsX8NjaLVKDE1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 6, 2024 at 11:18 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com>
wrote:
> On Tue, Aug 6, 2024 at 8:36 AM Michael Paquier <michael(at)paquier(dot)xyz>
wrote:
> > On Tue, Aug 06, 2024 at 05:17:10AM +0300, Alexander Korotkov wrote:
> > > The 0001 patch is intended to improve this situation. Actually, it's
> > > not right to just put RecoveryInProgress() after
> > > GetXLogReplayRecPtr(), because more wal could be replayed between
> > > these calls. Instead we need to recheck GetXLogReplayRecPtr() after
> > > getting negative result of RecoveryInProgress() because WAL replay
> > > position couldn't get updated after.
> > > 0002 patch comprises fix for the header comment of
WaitLSNSetLatches() function
> > > 0003 patch comprises tests for pg_wal_replay_wait() errors.
> >
> > Before adding more tests, could it be possible to stabilize what's in
> > the tree? drongo has reported one failure with the recovery test
> > 043_wal_replay_wait.pl introduced recently by 3c5db1d6b016:
> >
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-08-05%2004%3A24%3A54
>
> Thank you for pointing!
> Surely, I'll fix this before.

Something breaks in these lines during second iteration of the loop.
"SELECT pg_current_wal_insert_lsn()" has been queried from primary, but
standby didn't receive "CALL pg_wal_replay_wait('...');"

for (my $i = 0; $i < 5; $i++)
{
print($i);
$node_primary->safe_psql('postgres',
"INSERT INTO wait_test VALUES (${i});");
my $lsn =
$node_primary->safe_psql('postgres',
"SELECT pg_current_wal_insert_lsn()");
$psql_sessions[$i] = $node_standby1->background_psql('postgres');
$psql_sessions[$i]->query_until(
qr/start/, qq[
\\echo start
CALL pg_wal_replay_wait('${lsn}');
SELECT log_count(${i});
]);
}

I wonder what could it be. Probably something hangs inside launching
background psql... I'll investigate this more.

------
Regards,
Alexander Korotkov
Supabase

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Junwang Zhao 2024-08-06 10:28:16 Re: Support specify tablespace for each merged/split partition
Previous Message Peter Eisentraut 2024-08-06 10:23:03 Remove hardcoded hash opclass function signature exceptions