Quick Links

Re: Timeline issue if StartupXLOG() is interrupted right before end-of-recovery record is done

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Roman Eskin <r(dot)eskin(at)arenadata(dot)io>
Cc:	pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Timeline issue if StartupXLOG() is interrupted right before end-of-recovery record is done
Date:	2025-01-28 09:51:29
Message-ID:	A950518B-4116-492B-8773-C9A5CE1620AF@yandex-team.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> On 21 Jan 2025, at 16:47, Roman Eskin <r(dot)eskin(at)arenadata(dot)io> wrote:
>
>>
>> Persisting recovery signal file for some _timeout_ seems super dangerous to me. In distributed systems every extra _timeout_ is a source of complexity, uncertainty and despair.
>
> The approach is not about persisting the signal files for some timeout. Currently the files are removed in StartupXLOG() before writeTimeLineHistory() and PerformRecoveryXLogAction() are called. The suggestion is to move the file removal after PerformRecoveryXLogAction() inside StartupXLOG().

Sending node to repeated promote-fail cycle without resolving root cause seems like even less appealing idea.
If something prevented promotion, why we should retry by this particular method?

Even in case of transient failure which you described - power loss - it does not sound like a very good idea to retry promotion after returning online. The user will get unexpected splitbrain.

Best regards, Andrey Borodin.

In response to

Re: Timeline issue if StartupXLOG() is interrupted right before end-of-recovery record is done at 2025-01-21 11:47:19 from Roman Eskin

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Kapila	2025-01-28 09:56:13	Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message	Manika Singhal	2025-01-28 09:42:36	EDB Installer initcluster script changes - review requested