Re: [HACKERS] make async slave to wait for lsn to be replayed

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Kevin Hale Boyes <kcboyes(at)gmail(dot)com>, Kartyshov Ivan <i(dot)kartyshov(at)postgrespro(dot)ru>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, hlinnaka(at)iki(dot)fi, alvherre(at)alvh(dot)no-ip(dot)org, pashkin(dot)elfe(at)gmail(dot)com, bharath(dot)rupireddyforpostgres(at)gmail(dot)com, euler(at)eulerto(dot)com, thomas(dot)munro(at)gmail(dot)com, peter(at)eisentraut(dot)org, amit(dot)kapila16(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, smithpb2250(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed
Date: 2024-08-10 15:58:31
Message-ID: CAPpHfdvQ4ZZywz6Yys0YKeCKnHGEM4YybS-WZZp8R6Zh16yJyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 6, 2024 at 8:36 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Tue, Aug 06, 2024 at 05:17:10AM +0300, Alexander Korotkov wrote:
> > The 0001 patch is intended to improve this situation. Actually, it's
> > not right to just put RecoveryInProgress() after
> > GetXLogReplayRecPtr(), because more wal could be replayed between
> > these calls. Instead we need to recheck GetXLogReplayRecPtr() after
> > getting negative result of RecoveryInProgress() because WAL replay
> > position couldn't get updated after.
> > 0002 patch comprises fix for the header comment of WaitLSNSetLatches() function
> > 0003 patch comprises tests for pg_wal_replay_wait() errors.
>
> Before adding more tests, could it be possible to stabilize what's in
> the tree? drongo has reported one failure with the recovery test
> 043_wal_replay_wait.pl introduced recently by 3c5db1d6b016:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-08-05%2004%3A24%3A54

I'm currently running a 043_wal_replay_wait test in a loop of drongo.
No failures during more than 10 hours. As I pointed in [1] it seems
that test stuck somewhere on launching BackgroundPsql. Given that
drongo have some strange failures from time to time (for instance [2]
or [3]), I doubt there is something specifically wrong in
043_wal_replay_wait test that caused the subject failure.

Therefore, while I'm going to continue looking at the reason of
failure on drongo in background, I'm going to go ahead with my
improvements for pg_wal_replay_wait().

Links.
1. https://www.postgresql.org/message-id/CAPpHfduYkve0sw-qy4aCCmJv_MXfuuAQ7wyRQsX8NjaLVKDE1Q%40mail.gmail.com
2. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-08-02%2010%3A34%3A45
3. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-06-06%2012%3A36%3A11

------
Regards,
Alexander Korotkov
Supabase

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-08-10 16:29:04 Re: SPI_connect, SPI_connect_ext return type
Previous Message Dmitry Koval 2024-08-10 15:57:48 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands