From: | Kartyshov Ivan <i(dot)kartyshov(at)postgrespro(dot)ru> |
---|---|
To: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: [HACKERS] make async slave to wait for lsn to be replayed |
Date: | 2024-06-12 08:36:05 |
Message-ID: | ca85a3c56e3c26cc29d1df14021845f6@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi, Alexander, Here, I made some improvements according to your
discussion with Heikki.
On 2024-04-11 18:09, Alexander Korotkov wrote:
> On Thu, Apr 11, 2024 at 1:46 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
> wrote:
>> In a nutshell, it's possible for the loop in WaitForLSN to exit
>> without
>> cleaning up the process from the heap. I was able to hit that by
>> adding
>> a delay after the addLSNWaiter() call:
>>
>> > TRAP: failed Assert("!procInfo->inHeap"), File: "../src/backend/commands/waitlsn.c", Line: 114, PID: 1936152
>> > postgres: heikki postgres [local] CALL(ExceptionalCondition+0xab)[0x55da1f68787b]
>> > postgres: heikki postgres [local] CALL(+0x331ec8)[0x55da1f204ec8]
>> > postgres: heikki postgres [local] CALL(WaitForLSN+0x139)[0x55da1f2052cc]
>> > postgres: heikki postgres [local] CALL(pg_wal_replay_wait+0x18b)[0x55da1f2056e5]
>> > postgres: heikki postgres [local] CALL(ExecuteCallStmt+0x46e)[0x55da1f18031a]
>> > postgres: heikki postgres [local] CALL(standard_ProcessUtility+0x8cf)[0x55da1f4b26c9]
>>
>> I think there's a similar race condition if the timeout is reached at
>> the same time that the startup process wakes up the process.
>
> Thank you for catching this. I think WaitForLSN() just needs to call
> deleteLSNWaiter() unconditionally after exit from the loop.
Fix and add injection point test on this race condition.
> On Thu, Apr 11, 2024 at 1:46 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
> wrote:
>> The docs could use some-copy-editing, but just to point out one issue:
>>
>> > There are also procedures to control the progress of recovery.
>>
>> That's copy-pasted from an earlier sentence at the table that lists
>> functions like pg_promote(), pg_wal_replay_pause(), and
>> pg_is_wal_replay_paused(). The pg_wal_replay_wait() doesn't control
>> the
>> progress of recovery like those functions do, it only causes the
>> calling
>> backend to wait.
Fix documentation and add extra tests on multi-standby replication
and cascade replication.
--
Ivan Kartyshov
Postgres Professional: www.postgrespro.com
Attachment | Content-Type | Size |
---|---|---|
v18-0001-Implement-pg_wal_replay_wait-stored-procedure.patch | text/x-diff | 38.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jelte Fennema-Nio | 2024-06-12 08:51:05 | Re: Add support to TLS 1.3 cipher suites and curves lists |
Previous Message | Jelte Fennema-Nio | 2024-06-12 08:02:17 | Re: Partial aggregates pushdown |