Re: Replication failure, slave requesting old segments

From: "Phil Endecott" <spam_from_pgsql_lists(at)chezphil(dot)org>
To: <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Replication failure, slave requesting old segments
Date: 2018-08-12 19:53:08
Message-ID: 1534103588284@dmwebmail.dmwebmail.chezphil.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Phil Endecott wrote:
> On the master, I have:
>
> wal_level = replica
> archive_mode = on
> archive_command = 'ssh backup test ! -f backup/postgresql/archivedir/%f &&
> scp %p backup:backup/postgresql/archivedir/%f'
>
> On the slave I have:
>
> standby_mode = 'on'
> primary_conninfo = 'user=postgres host=master port=5432'
> restore_command = 'scp backup:backup/postgresql/archivedir/%f %p'
>
> hot_standby = on

> 2018-08-11 00:05:50.364 UTC [615] LOG: restored log file "0000000100000007000000D0" from archive
> scp: backup/postgresql/archivedir/0000000100000007000000D1: No such file or directory
> 2018-08-11 00:05:51.325 UTC [7208] LOG: started streaming WAL from primary at 7/D0000000 on timeline 1
> 2018-08-11 00:05:51.325 UTC [7208] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 0000000100000007000000D0 has already been removed

I am wondering if I need to set wal_keep_segments to at least 1 or 2 for
this to work. I currently have it unset and I believe the default is 0.

My understanding was that when using archive_command/restore_command to copy
WAL segments it would not be necessary to use wal_keep_segments to retain
files in pg_xlog on the server; the slave can get everything using a
combination of copying files using the restore_command and streaming.
But these lines from the log:

2018-08-11 00:12:15.797 UTC [7954] LOG: redo starts at 7/D0F956C0
2018-08-11 00:12:16.068 UTC [7954] LOG: consistent recovery state reached at 7/D0FFF088

make me think that there is an issue when the slave reaches the end of the
copied WAL file. I speculate that the useful content of this WAL segment
ends at FFF088, which is followed by an empty gap due to record sizes. But
the slave tries to start streaming from this point, D0FFF088, not D1000000.
If the master still had a copy of segment D0 then it would be able to stream
this gap followed by the real content in the current segment D1.

Does that make any sense at all?

Regards, Phil.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2018-08-12 19:58:26 Re: Replication failure, slave requesting old segments
Previous Message Phil Endecott 2018-08-12 19:25:38 Re: Replication failure, slave requesting old segments