Re: Change log level for notifying hot standby is waiting non-overflowed snapshot

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
Date: 2025-03-31 03:51:06
Message-ID: 75994df0-81f1-4576-85f1-55da4ea3a392@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2025/03/28 0:13, torikoshia wrote:
> On 2025-03-27 11:06, torikoshia wrote:
>> Hi,
>>
>> I had another off-list discussion with Fujii-san, and according to the
>> following manual[1], it seems that a transaction with an overflowed
>> subtransaction is already considered inconsistent:
>>
>>   Reaching a consistent state can also be delayed in the presence of
>> both of these conditions:
>>
>>   - A write transaction has more than 64 subtransactions
>>   - Very long-lived write transactions
>>
>> IIUC, the manual suggests that both conditions must be met -- recovery
>> reaching at least minRecoveryPoint and no overflowed subtransactions
>> —- for the standby to be considered consistent.
>>
>> OTOH, the following log message is emitted even when subtransactions
>> have overflowed, which appears to contradict the definition of
>> consistency mentioned above:
>>
>>   LOG:  consistent recovery state reached
>>
>> This log message is triggered when recovery progresses beyond
>> minRecoveryPoint(according to CheckRecoveryConsistency()).
>> However, since this state does not satisfy 'consistency' defined in
>> the manual, I think it would be more accurate to log that it has
>> merely reached the "minimum recovery point".
>> Furthermore, it may be better to emit the above log message only when
>> recovery has progressed beyond minRecoveryPoint and there are no
>> overflowed subtransactions.
>>
>> Attached patch does this.
>>
>> Additionally, renaming variables such as reachedConsistency in
>> CheckRecoveryConsistency might also be appropriate.
>> However, in the attached patch, I have left them unchanged for now.
>>
>>
>> On 2025-03-25 00:55, Fujii Masao wrote:
>>> -                  case CAC_NOTCONSISTENT:
>>> +                  case CAC_NOTCONSISTENT_OR_OVERFLOWED:
>>>
>>> This new name seems a bit too long. I'm OK to leave the name as it is.
>>> Or, something like CAC_NOTHOTSTANDBY seems simpler and better to me.
>>
>> Beyond just the length issue, given the understanding outlined above,
>> I now think CAC_NOTCONSISTENT does not actually need to be changed.
>>
>>
>>> In high-availability.sgml, the "Administrator's Overview" section already
>>> describes the conditions for accepting hot standby connections.
>>> This section should also be updated accordingly.
>>
>> Agreed.
>> I have updated this section to mention that the resolution is to close
>> the problematic transaction.
>> OTOH the changes made in v2 patch seem unnecessary, since the concept
>> of 'consistent' is already explained in the "Administrator's
>> Overview."
>>
>>
>> -                             errdetail("Consistent recovery state has not been yet reached.")));
>> +                             errdetail("Consistent recovery state has not been yet
>> reached, or snappshot is pending because subtransaction is
>> overflowed."),
>>
>> Given the above understanding, "or" is not appropriate in this
>> context, so I left this message unchanged.
>> Instead, I have added an errhint. The phrasing in the hint message
>> aligns with the manual, allowing users to search for this hint and
>> find the newly added resolution instructions.
>
> On second thought, it may not be appropriate to show this output to clients attempting to connect. This message should be notified not to clients but to administrators.
>
> From this point of view, it'd be better to output a message indicating the status inside ProcArrayApplyRecoveryInfo(). However, a straightforward implementation would result in the same message being logged every time an XLOG_RUNNING_XACTS WAL is received, making it noisy.
>
> Instead of directly outputting a log indicating that a hot standby connection cannot be established due to subtransaction overflow, the attached patch updates the manual so that administrators can determine whether a subtransaction overflow has occurred based on the modified log output.
>
>
> What do you think?

I had the same thought during our off-list discussion. However,
after reviewing the recovery code - such as recoveryStopsBefore(),
which checks whether a consistent state is reached - I now believe
the manual’s definition of a consistent state may be incorrect.
A consistent state should be defined as the point where recovery
has reached minRecoveryPoint.

If we were to change the definition to match the manual,
we would also need to update various recovery checks,
which wouldn't be a trivial task.

Given that, I now think it's better to revive your v1 patch,
which introduces a new postmaster signal and improves the error message
when connections are not accepted during hot standby. I've attached
a revised version of the patch based on your v1. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment Content-Type Size
v5-0001-Improve-error-message-when-standby-does-accept-co.patch text/plain 8.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2025-03-31 05:10:23 Re: Extended Statistics set/restore/clear functions.
Previous Message David Rowley 2025-03-31 03:33:03 Re: Memoize ANTI and SEMI JOIN inner