From: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
---|---|
To: | fabriziomello(at)gmail(dot)com |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Strange debug message of walreciver? |
Date: | 2015-03-09 01:33:47 |
Message-ID: | 20150309.103347.657915457122798479.t-ishii@sraoss.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On Sun, Mar 8, 2015 at 8:13 PM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>>
>> > On Sat, Mar 7, 2015 at 10:18 PM, Tatsuo Ishii <ishii(at)postgresql(dot)org>
> wrote:
>> >>
>> >> When I set log_min_messages to debug5 and looked into walreciver log,
>> >> I saw this:
>> >>
>> >> 3600 2015-03-08 09:47:38 JST DEBUG: sendtime 2015-03-08
>> > 09:47:38.31493+09 receipttime 2015-03-08 09:47:38.315027+09 replication
>> > apply delay -1945478837 ms tran sfer latency 0 ms
>> >>
>> >> The "replication apply delay -1945478837 ms" part looks strange
>> >> because the delay is below 0. The number is formatted as %d in elog
>> >> call, and I suspect this is kind of integer overflow.
>> >>
>> >
>> > Looking at GetReplicationApplyDelay() in walreceiverfuncs.c I noticed
> that
>> > the integer overflow occurs because sometimes the return of the
>> > GetCurrentChunkReplayStartTime() is 0 (zero).
>> >
>> > I added an elog into GetReplicationApplyDelay to check this info:
>> >
>> > DEBUG: GetReplicationApplyDelay 479099832 seconds, 352 milliseconds,
>> > (0.000000, 479099832352083.000000)
>> > DEBUG: sendtime 2015-03-08 00:17:12.351987-03 receipttime 2015-03-08
>> > 00:17:12.352043-03 replication apply delay -1936504800 ms transfer
> latency
>> > 0 ms
>> > DEBUG: GetReplicationApplyDelay 479099841 seconds, 450 milliseconds,
>> > (0.000000, 479099841450320.000000)
>> > DEBUG: sendtime 2015-03-08 00:17:21.45018-03 receipttime 2015-03-08
>> > 00:17:21.450294-03 replication apply delay -1936495702 ms transfer
> latency
>> > 0 ms
>>
>> Right. Until walreceiver gets the first WAL record to replay,
>> xlogctl->currentChunkStartTime remains 0.
>>
>> > Maybe we should check before and return zero from
> GetReplicationApplyDelay.
>> > The attached patch implement it.
>>
>> Your patch regards 0 replication apply delay in the case above which
>> is confusing if the delay is actually 0.
>>
>> What about something like this to explicitly stats the delay data is
>> not available?
>>
>> elog(DEBUG2, "sendtime %s receipttime %s replication apply delay (N/A)
> transfer latency %d ms",
>>
>
> Makes sense. Attached patch implement what you suggested.
Looks good. If there's no objection, I will commit this.
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2015-03-09 02:11:34 | Re: Bootstrap DATA is a pita |
Previous Message | Fabrízio de Royes Mello | 2015-03-09 01:04:36 | Re: Strange debug message of walreciver? |