Re: Replication failure, slave requesting old segments

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Phil Endecott <spam_from_pgsql_lists(at)chezphil(dot)org>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Replication failure, slave requesting old segments
Date: 2018-08-11 21:48:01
Message-ID: e89b76f9-f60a-a645-587f-00aeb3c68770@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 08/11/2018 12:42 PM, Phil Endecott wrote:
> Hi Adrian,
>
> Adrian Klaver wrote:
>> Looks like the master recycled the WAL's while the slave could not
>> connect.
>
> Yes but... why is that a problem?  The master is copying the WALs to
> the backup server using scp, where they remain forever.  The slave gets

To me it looks like that did not happen:

2018-08-11 00:05:50.364 UTC [615] LOG: restored log file
"0000000100000007000000D0" from archive
scp: backup/postgresql/archivedir/0000000100000007000000D1: No such file
or directory
2018-08-11 00:05:51.325 UTC [7208] LOG: started streaming WAL from
primary at 7/D0000000 on timeline 1
2018-08-11 00:05:51.325 UTC [7208] FATAL: could not receive data from
WAL stream: ERROR: requested WAL segment 0000000100000007000000D0 has
already been removed

Above 0000000100000007000000D0 is gone/recycled on the master and the
archived version does not seem to be complete as the streaming
replication is trying to find it.

Below you kick the master and it coughs up the files to the archive
including *D0 and *D1 on up to *D4 and then the streaming picks using *D5.

2018-08-11 00:55:49.741 UTC [7954] LOG: restored log file
"0000000100000007000000D0" from archive
2018-08-11 00:56:12.304 UTC [7954] LOG: restored log file
"0000000100000007000000D1" from archive
2018-08-11 00:56:35.481 UTC [7954] LOG: restored log file
"0000000100000007000000D2" from archive
2018-08-11 00:56:57.443 UTC [7954] LOG: restored log file
"0000000100000007000000D3" from archive
2018-08-11 00:57:21.723 UTC [7954] LOG: restored log file
"0000000100000007000000D4" from archive
scp: backup/postgresql/archivedir/0000000100000007000000D5: No such file
or directory
2018-08-11 00:57:22.915 UTC [7954] LOG: unexpected pageaddr 7/C7000000
in log segment 00000001000000070000
00D5, offset 0
2018-08-11 00:57:23.114 UTC [12348] LOG: started streaming WAL from
primary at 7/D5000000 on timeline 1

Best guess is the archiving did not work as expected during:

"(During this time the master was also down for a shorter period.)"

> them from there before it starts streaming.  So it shouldn't matter
> if the master recycles them, as the slave should be able to get everything
> using the combination of scp and then streaming.
>
> Am I missing something about how this sort of replication is supposed to
> work?
>
>
> Thanks, Phil.
>
>
>
>
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Stephen Frost 2018-08-11 22:53:36 Re: Replication failure, slave requesting old segments
Previous Message Phil Endecott 2018-08-11 19:42:05 Re: Replication failure, slave requesting old segments