Re: Add Information during standby recovery conflicts

From: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add Information during standby recovery conflicts
Date: 2020-10-04 11:48:22
Message-ID: 29da248a-e21c-a3eb-a051-f1ec79b13d31@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 8/28/20 4:14 PM, Drouvot, Bertrand wrote:
>
> On 8/28/20 7:03 AM, Masahiko Sawada wrote:
>> CAUTION: This email originated from outside of the organization. Do
>> not click links or open attachments unless you can confirm the sender
>> and know the content is safe.
>>
>>
>>
>> On Thu, 27 Aug 2020 at 20:58, Drouvot, Bertrand <bdrouvot(at)amazon(dot)com>
>> wrote:
>>>
>>> On 8/27/20 10:16 AM, Masahiko Sawada wrote
>>>> On Mon, 10 Aug 2020 at 23:43, Drouvot, Bertrand
>>>> <bdrouvot(at)amazon(dot)com> wrote:
>>>>> Hi,
>>>>>
>>>>> On 7/31/20 7:12 AM, Masahiko Sawada wrote:
>>>>>> +   tmpWaitlist = waitlist;
>>>>>> +   while (VirtualTransactionIdIsValid(*tmpWaitlist))
>>>>>> +   {
>>>>>> +       tmpWaitlist++;
>>>>>> +   }
>>>>>> +
>>>>>> +   num_waitlist_entries = (tmpWaitlist - waitlist);
>>>>>> +
>>>>>> +   /* display wal record information */
>>>>>> +   if (log_recovery_conflicts &&
>>>>>> (TimestampDifferenceExceeds(recovery_conflicts_log_time,
>>>>>> GetCurrentTimestamp(),
>>>>>> +                                   DeadlockTimeout))) {
>>>>>> +       LogBlockedWalRecordInfo(num_waitlist_entries, reason);
>>>>>> +       recovery_conflicts_log_time = GetCurrentTimestamp();
>>>>>> +   }
>>>>>>
>>>>>> recovery_conflicts_log_time is not initialized. And shouldn't we
>>>>>> compare the current timestamp to the timestamp when the startup
>>>>>> process started waiting?
>>>>>>
>>>>>> I think we should call LogBlockedWalRecordInfo() inside of the inner
>>>>>> while loop rather than at the beginning of
>>>>>> ResolveRecoveryConflictWithVirtualXIDs(). In lock conflict cases,
>>>>>> the
>>>>>> startup process waits until 'ltime', then enters
>>>>>> ResolveRecoveryConflictWithVirtualXIDs() after reaching 'ltime'.
>>>>>> Therefore, it makes sense to call LogBlockedWalRecordInfo() at the
>>>>>> beginning of ResolveRecoveryConflictWithVirtualXIDs(). However, in
>>>>>> snapshot and tablespace conflict cases (i.g.
>>>>>> ResolveRecoveryConflictWithSnapshot() and
>>>>>> ResolveRecoveryConflictWithTablespace()), it enters
>>>>>> ResolveRecoveryConflictWithVirtualXIDs() without waits and waits for
>>>>>> reaching ‘ltime’ inside of the inner while look. So the above
>>>>>> condition could always be false.
>>>>> That would make the information being displayed after
>>>>> max_standby_streaming_delay is reached for the multiple cases you
>>>>> just
>>>>> described.
>>>> Sorry, it should be deadlock_timeout, not max_standby_streaming_delay.
>>>> Otherwise, the recovery conflict log message is printed when
>>>> resolution, which seems not to achieve the original purpose. Am I
>>>> missing something?
>>> Ok, I understand where the confusion is coming from.
>>>
>>> Indeed the new version is now printing the recovery conflict log
>>> message
>>> during the conflict resolution (while the initial intention was to be
>>> warned as soon as the replay had to wait).
>>>
>>> The advantage of the new version is that it would be consistent across
>>> all the conflicts scenarios (if not, we would get messages during the
>>> resolution or when the replay started waiting, depending of the
>>> conflict
>>> scenario).
>>>
>>> On the other hand, the cons of the new version is that we would miss
>>> messages when no resolution is needed (replay wait duration <
>>> max_standby_streaming_delay), but is that really annoying? (As no
>>> cancellation would occur)
>>>
>>> Thinking about it, i like the new version (being warned during the
>>> resolution) as we would get messages only when cancelation will occur
>>> (which is what the user might want to avoid, so the extra info would be
>>> useful).
>>>
>>> What do you think?
>> Hmm, I think we print the reason why backends are canceled even of as
>> now by ProcessInterrupts(). With this patch and related patches you
>> proposed on another thread, the startup process reports virtual xids
>> being interrupted, the reason, and LSN of blocked WAL, then processes
>> will also report its virtual xid and reason. Therefore, the new
>> information added by these patches is only the LSN of blocked WAL.
>
> That's completely right, let's come back to the original intention of
> this patch (means, don't wait for the conflict resolution to log
> messages).
>
> I'll submit a new version once updated.

Please find attached the new patch.

It provides the following outcomes depending on the conflict:

2020-10-04 09:08:51.923 UTC [30788] LOG:  recovery is waiting recovery
conflict on buffer pin

OR

2020-10-04 09:52:25.832 UTC [1249] LOG:  recovery is waiting recovery
conflict on snapshot
2020-10-04 09:52:25.832 UTC [1249] DETAIL:  Conflicting virtual
transaction ids: 3/2, 2/4.

OR

2020-10-04 09:11:51.717 UTC [30788] LOG:  recovery is waiting recovery
conflict on lock
2020-10-04 09:11:51.717 UTC [30788] DETAIL:  Conflicting virtual
transaction id: 2/5.

OR

2020-10-04 09:13:04.104 UTC [30788] LOG:  recovery is resolving recovery
conflict on database

Thanks

Bertrand

Attachment Content-Type Size
v1-0004-Add-Standby-Recovery-Conflicts-BlockedOnly-Info.patch text/plain 10.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-10-04 14:10:34 Re: Add Information during standby recovery conflicts
Previous Message Dilip Kumar 2020-10-04 10:31:17 Re: Re: [HACKERS] Custom compression methods