From: | "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com> |
---|---|
To: | Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Add Information during standby recovery conflicts |
Date: | 2020-10-04 11:48:22 |
Message-ID: | 29da248a-e21c-a3eb-a051-f1ec79b13d31@amazon.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 8/28/20 4:14 PM, Drouvot, Bertrand wrote:
>
> On 8/28/20 7:03 AM, Masahiko Sawada wrote:
>> CAUTION: This email originated from outside of the organization. Do
>> not click links or open attachments unless you can confirm the sender
>> and know the content is safe.
>>
>>
>>
>> On Thu, 27 Aug 2020 at 20:58, Drouvot, Bertrand <bdrouvot(at)amazon(dot)com>
>> wrote:
>>>
>>> On 8/27/20 10:16 AM, Masahiko Sawada wrote
>>>> On Mon, 10 Aug 2020 at 23:43, Drouvot, Bertrand
>>>> <bdrouvot(at)amazon(dot)com> wrote:
>>>>> Hi,
>>>>>
>>>>> On 7/31/20 7:12 AM, Masahiko Sawada wrote:
>>>>>> + tmpWaitlist = waitlist;
>>>>>> + while (VirtualTransactionIdIsValid(*tmpWaitlist))
>>>>>> + {
>>>>>> + tmpWaitlist++;
>>>>>> + }
>>>>>> +
>>>>>> + num_waitlist_entries = (tmpWaitlist - waitlist);
>>>>>> +
>>>>>> + /* display wal record information */
>>>>>> + if (log_recovery_conflicts &&
>>>>>> (TimestampDifferenceExceeds(recovery_conflicts_log_time,
>>>>>> GetCurrentTimestamp(),
>>>>>> + DeadlockTimeout))) {
>>>>>> + LogBlockedWalRecordInfo(num_waitlist_entries, reason);
>>>>>> + recovery_conflicts_log_time = GetCurrentTimestamp();
>>>>>> + }
>>>>>>
>>>>>> recovery_conflicts_log_time is not initialized. And shouldn't we
>>>>>> compare the current timestamp to the timestamp when the startup
>>>>>> process started waiting?
>>>>>>
>>>>>> I think we should call LogBlockedWalRecordInfo() inside of the inner
>>>>>> while loop rather than at the beginning of
>>>>>> ResolveRecoveryConflictWithVirtualXIDs(). In lock conflict cases,
>>>>>> the
>>>>>> startup process waits until 'ltime', then enters
>>>>>> ResolveRecoveryConflictWithVirtualXIDs() after reaching 'ltime'.
>>>>>> Therefore, it makes sense to call LogBlockedWalRecordInfo() at the
>>>>>> beginning of ResolveRecoveryConflictWithVirtualXIDs(). However, in
>>>>>> snapshot and tablespace conflict cases (i.g.
>>>>>> ResolveRecoveryConflictWithSnapshot() and
>>>>>> ResolveRecoveryConflictWithTablespace()), it enters
>>>>>> ResolveRecoveryConflictWithVirtualXIDs() without waits and waits for
>>>>>> reaching ‘ltime’ inside of the inner while look. So the above
>>>>>> condition could always be false.
>>>>> That would make the information being displayed after
>>>>> max_standby_streaming_delay is reached for the multiple cases you
>>>>> just
>>>>> described.
>>>> Sorry, it should be deadlock_timeout, not max_standby_streaming_delay.
>>>> Otherwise, the recovery conflict log message is printed when
>>>> resolution, which seems not to achieve the original purpose. Am I
>>>> missing something?
>>> Ok, I understand where the confusion is coming from.
>>>
>>> Indeed the new version is now printing the recovery conflict log
>>> message
>>> during the conflict resolution (while the initial intention was to be
>>> warned as soon as the replay had to wait).
>>>
>>> The advantage of the new version is that it would be consistent across
>>> all the conflicts scenarios (if not, we would get messages during the
>>> resolution or when the replay started waiting, depending of the
>>> conflict
>>> scenario).
>>>
>>> On the other hand, the cons of the new version is that we would miss
>>> messages when no resolution is needed (replay wait duration <
>>> max_standby_streaming_delay), but is that really annoying? (As no
>>> cancellation would occur)
>>>
>>> Thinking about it, i like the new version (being warned during the
>>> resolution) as we would get messages only when cancelation will occur
>>> (which is what the user might want to avoid, so the extra info would be
>>> useful).
>>>
>>> What do you think?
>> Hmm, I think we print the reason why backends are canceled even of as
>> now by ProcessInterrupts(). With this patch and related patches you
>> proposed on another thread, the startup process reports virtual xids
>> being interrupted, the reason, and LSN of blocked WAL, then processes
>> will also report its virtual xid and reason. Therefore, the new
>> information added by these patches is only the LSN of blocked WAL.
>
> That's completely right, let's come back to the original intention of
> this patch (means, don't wait for the conflict resolution to log
> messages).
>
> I'll submit a new version once updated.
Please find attached the new patch.
It provides the following outcomes depending on the conflict:
2020-10-04 09:08:51.923 UTC [30788] LOG: recovery is waiting recovery
conflict on buffer pin
OR
2020-10-04 09:52:25.832 UTC [1249] LOG: recovery is waiting recovery
conflict on snapshot
2020-10-04 09:52:25.832 UTC [1249] DETAIL: Conflicting virtual
transaction ids: 3/2, 2/4.
OR
2020-10-04 09:11:51.717 UTC [30788] LOG: recovery is waiting recovery
conflict on lock
2020-10-04 09:11:51.717 UTC [30788] DETAIL: Conflicting virtual
transaction id: 2/5.
OR
2020-10-04 09:13:04.104 UTC [30788] LOG: recovery is resolving recovery
conflict on database
Thanks
Bertrand
Attachment | Content-Type | Size |
---|---|---|
v1-0004-Add-Standby-Recovery-Conflicts-BlockedOnly-Info.patch | text/plain | 10.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2020-10-04 14:10:34 | Re: Add Information during standby recovery conflicts |
Previous Message | Dilip Kumar | 2020-10-04 10:31:17 | Re: Re: [HACKERS] Custom compression methods |