Re: streaming replication timeout error

From: 高健 <luckyjackgao(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: streaming replication timeout error
Date: 2013-10-10 00:51:57
Message-ID: CAL454F1TeTqNsPRNohcxOxZ+mN_K1nELt7R1xProtOTdUPD_ag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello:

Thanks for replying.

The recovery.conf file on standby(DB2) is like that:

standby_mode = 'on'
primary_conninfo = 'host=DB1 port=5432 application_name=testpg
user=postgres connect_timeout=10 keepalives_idle=5 keepalives_interval=1'
recovery_target_timeline = 'latest'
restore_command = 'scp -o "ConnectTimeout 5" -i
/opt/PostgresPlus/9.2AS/.ssh/id_edb
DB1:/opt/PostgresPlus/9.2AS/data/arch/%f %p'

I am not familiar with the scp command, I think that here scp is used to
copy archive wal log files from primary to standby...

Maybe the ConnectionTimeout is too small, And sometimes when network is not
very well,
the restore_command will fail and return FATAL error?

In fact I am a little confused about restore_command, we are using
streaming replication, but why restore_command is still needed to copy
archive wal log, isn't it the old warm standby (file shipping)?

Best Regards
jian gao

2013/10/9 Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>

> On 10/08/2013 07:58 PM, 高健 wrote:
>
>> Hello:
>>
>> My customer encountered some connection timeout, while using one
>> primary-one standby streaming replication.
>>
>> The original log is japanese, because there are no error-code like
>> oracle's ora-xxx,
>> I tried to translate the japanese information into English, But that
>> might be not correct English for PG.
>>
>>
>> The most important part is:
>>
>> 2013-09-22 09:52:47 JST[28297][51d1fbcb.6e89-2][0]**[XX000]FATAL: Could
>> not receive data from WAL stream: could not receive data from server:
>> connection timeout
>> scp: /opt/PostgresPlus/9.2AS/data/**arch/000000AC000001F10000004A: No
>> such
>> file or directory
>>
>> I was asked about:
>> In what occasion will the above fatal error occur?
>>
>> I looked into the postgresql.conf file for the primary and standby server.
>> And made some experiences.
>>
>> I found:
>> Senario I:
>>
>> If the wal file wanted is removed manually:
>>
>> Both in primary and standby, log will be like this:
>> FATAL: could not receive data from WAL stream: FATAL: requested WAL
>> segment 000000010000000000000011 has already been removed
>>
>>
>>
>
>> But I haven't found a good explanation fitting the logs' FATAL error.
>> Can anybody give me some info?
>>
>
> Would seem to me the interesting part is:
>
>
> scp: /opt/PostgresPlus/9.2AS/data/**arch/000000AC000001F10000004A: No
> such file or directory
>
> Are using scp to move WAL files to an archive directory?
>
> If so, it seems scp is having issues, either network interruption or the
> file is disappearing under it.
>
>
>
>> Thanks in advance
>>
>> jian gao
>>
>>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)gmail(dot)com
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Brian Wong 2013-10-10 00:57:13 Re: ERROR: invalid value "????" for "YYYY"
Previous Message Torsten Förtsch 2013-10-09 18:44:43 declare constraint as valid