Re: [HACKERS] make async slave to wait for lsn to be replayed

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Kartyshov Ivan <i(dot)kartyshov(at)postgrespro(dot)ru>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed
Date: 2024-06-14 12:46:15
Message-ID: CAPpHfdvGRssjqwX1+idm5Tu-eWsTcx6DthB2LhGqA1tZ29jJaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Ivan!

On Wed, Jun 12, 2024 at 11:36 AM Kartyshov Ivan
<i(dot)kartyshov(at)postgrespro(dot)ru> wrote:
>
> Hi, Alexander, Here, I made some improvements according to your
> discussion with Heikki.
>
> On 2024-04-11 18:09, Alexander Korotkov wrote:
> > On Thu, Apr 11, 2024 at 1:46 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
> > wrote:
> >> In a nutshell, it's possible for the loop in WaitForLSN to exit
> >> without
> >> cleaning up the process from the heap. I was able to hit that by
> >> adding
> >> a delay after the addLSNWaiter() call:
> >>
> >> > TRAP: failed Assert("!procInfo->inHeap"), File: "../src/backend/commands/waitlsn.c", Line: 114, PID: 1936152
> >> > postgres: heikki postgres [local] CALL(ExceptionalCondition+0xab)[0x55da1f68787b]
> >> > postgres: heikki postgres [local] CALL(+0x331ec8)[0x55da1f204ec8]
> >> > postgres: heikki postgres [local] CALL(WaitForLSN+0x139)[0x55da1f2052cc]
> >> > postgres: heikki postgres [local] CALL(pg_wal_replay_wait+0x18b)[0x55da1f2056e5]
> >> > postgres: heikki postgres [local] CALL(ExecuteCallStmt+0x46e)[0x55da1f18031a]
> >> > postgres: heikki postgres [local] CALL(standard_ProcessUtility+0x8cf)[0x55da1f4b26c9]
> >>
> >> I think there's a similar race condition if the timeout is reached at
> >> the same time that the startup process wakes up the process.
> >
> > Thank you for catching this. I think WaitForLSN() just needs to call
> > deleteLSNWaiter() unconditionally after exit from the loop.
>
> Fix and add injection point test on this race condition.
>
> > On Thu, Apr 11, 2024 at 1:46 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
> > wrote:
> >> The docs could use some-copy-editing, but just to point out one issue:
> >>
> >> > There are also procedures to control the progress of recovery.
> >>
> >> That's copy-pasted from an earlier sentence at the table that lists
> >> functions like pg_promote(), pg_wal_replay_pause(), and
> >> pg_is_wal_replay_paused(). The pg_wal_replay_wait() doesn't control
> >> the
> >> progress of recovery like those functions do, it only causes the
> >> calling
> >> backend to wait.
>
> Fix documentation and add extra tests on multi-standby replication
> and cascade replication.

Thank you for the revised patch.

I see couple of items which are not addressed in this revision.
* As Heikki pointed, that it's currently not possible in one round
trip to call call pg_wal_replay_wait() and do other job. The attached
patch addresses this. It milds the requirement. Now, it's not
necessary to be in atomic context. It's only necessary to have no
active snapshot. This new requirement works for me so far. I
appreciate a feedback on this.
* As Alvaro pointed, multiple waiters case isn't covered by the test
suite. That leads to no coverage of some code paths. The attached
patch addresses this by adding a test case with multiple waiters.

The rest looks good to me.

Links.
1. https://www.postgresql.org/message-id/d1303584-b763-446c-9409-f4516118219f%40iki.fi
2.https://www.postgresql.org/message-id/202404051815.eri4u5q6oj26%40alvherre.pgsql

------
Regards,
Alexander Korotkov
Supabase

Attachment Content-Type Size
v19-0001-Implement-pg_wal_replay_wait-stored-procedure.patch application/x-patch 39.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2024-06-14 12:54:34 assertion failure at cost_memoize_rescan()
Previous Message Tomas Vondra 2024-06-14 12:27:50 Re: 回复: An implementation of multi-key sort