Re: Replication failure, slave requesting old segments

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Phil Endecott <spam_from_pgsql_lists(at)chezphil(dot)org>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Replication failure, slave requesting old segments
Date: 2018-08-12 20:23:29
Message-ID: 444ada2d-8896-cd74-57dd-531999190182@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 08/12/2018 12:53 PM, Phil Endecott wrote:
> Phil Endecott wrote:
>> On the master, I have:
>>
>> wal_level = replica
>> archive_mode = on
>> archive_command = 'ssh backup test ! -f backup/postgresql/archivedir/%f &&
>> scp %p backup:backup/postgresql/archivedir/%f'
>>
>> On the slave I have:
>>
>> standby_mode = 'on'
>> primary_conninfo = 'user=postgres host=master port=5432'
>> restore_command = 'scp backup:backup/postgresql/archivedir/%f %p'
>>
>> hot_standby = on
>
>> 2018-08-11 00:05:50.364 UTC [615] LOG: restored log file "0000000100000007000000D0" from archive
>> scp: backup/postgresql/archivedir/0000000100000007000000D1: No such file or directory
>> 2018-08-11 00:05:51.325 UTC [7208] LOG: started streaming WAL from primary at 7/D0000000 on timeline 1
>> 2018-08-11 00:05:51.325 UTC [7208] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 0000000100000007000000D0 has already been removed
>
>
> I am wondering if I need to set wal_keep_segments to at least 1 or 2 for
> this to work. I currently have it unset and I believe the default is 0.

Given that WAL's are only 16 MB I would probably bump it up to be on
safe side, or use:

https://www.postgresql.org/docs/9.6/static/warm-standby.html

26.2.6. Replication Slots

Though the above does not limit storage of WAL's, so a long outage could
result in WAL's piling up.

>
> My understanding was that when using archive_command/restore_command to copy
> WAL segments it would not be necessary to use wal_keep_segments to retain
> files in pg_xlog on the server; the slave can get everything using a
> combination of copying files using the restore_command and streaming.
> But these lines from the log:
>
> 2018-08-11 00:12:15.797 UTC [7954] LOG: redo starts at 7/D0F956C0
> 2018-08-11 00:12:16.068 UTC [7954] LOG: consistent recovery state reached at 7/D0FFF088
>
> make me think that there is an issue when the slave reaches the end of the
> copied WAL file. I speculate that the useful content of this WAL segment
> ends at FFF088, which is followed by an empty gap due to record sizes. But
> the slave tries to start streaming from this point, D0FFF088, not D1000000.
> If the master still had a copy of segment D0 then it would be able to stream
> this gap followed by the real content in the current segment D1.
>
> Does that make any sense at all?
>
>
> Regards, Phil.
>
>
>
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message TalGloz 2018-08-12 20:55:30 Re: PostgreSQL C Language Extension with C++ Code
Previous Message TalGloz 2018-08-12 20:15:04 Re: PostgreSQL C Language Extension with C++ Code