Re: Interesting streaming replication issue

From: James Sewell <james(dot)sewell(at)jirotech(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: Gunnar Nick Bluth <gunnar(dot)bluth(at)pro-open(dot)de>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Interesting streaming replication issue
Date: 2017-08-03 00:38:36
Message-ID: CAANVwEtPJ+YCFGu+vH9eH4Nvk7FZ6NH1TXMYbyy-nQGBZUJOEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

OK this is reproducible now.

1. Stop a standby
2. Write some data to the master
3. Wait till the master has archived some WAL logs
4. Wait till the archived logs have been removed from pg_xlog
5. Start the standby.

The standby will recover all logs from the master log archive up to log X,
it will then try to get log X+1 and fail (doesn't exist).

It will then try to start streaming log X (not X+1) from the master and
fail (it's been archived). This will loop forever, example below.

scp: /archive/xlog//0000000D.history: No such file or directory
2017-08-03 10:26:41 AEST [578]: [1037-1] user=,db=,client= (0:00000)LOG:
restored log file "0000000C0000006 E000000AE" from archive
scp: /archive/xlog//0000000C0000006E000000AF: No such file or directory
2017-08-03 10:26:41 AEST [68161]: [1-1] user=,db=,client= (0:00000)LOG:
started streaming WAL from primary at 6E/AE000000 on timeline 12
2017-08-03 10:26:41 AEST [68161]: [2-1] user=,db=,client= (0:XX000)FATAL:
could not receive data from WAL s tream: ERROR: requested WAL segment
0000000C0000006E000000AE has already been removed

At this stage the standby has log X in pg_xlog, and this log has an
identical md5 checksum to the log in the master archive.

Performing a pg_switch_xlog on the master pushes log X+1 to the archive,
which is picked up by the standby allowing streaming replication to start.

The only interesting thing I can see in log X is that it's 99% made up
of FPI_FOR_HINT records.

Any ideas?

Cheers,
James

James Sewell,
PostgreSQL Team Lead / Solutions Architect

Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
*P *(+61) 2 8099 9000 <(+61)%202%208099%209000> *W* www.jirotech.com *F *
(+61) 2 8099 9099 <(+61)%202%208099%209000>

On Fri, Jul 28, 2017 at 6:28 AM, James Sewell <james(dot)sewell(at)jirotech(dot)com>
wrote:

>
>>>
>>>> are you sure you're scp'ing from the archive, not from pg_xlog?
>>>>
>>>
>>> Yes:
>>>
>>> restore_command = 'scp -o StrictHostKeyChecking=no 10.154.19.30:/archive/xlog//%f
>>> %p'
>>>
>>> Although you are right - that would almost make sense if I had done that!
>>>
>>
>> Sounds a lot like a cleanup process on your archive directory or
>> something getting in the way. Are the logs pg is asking for in that archive
>> dir?
>>
>
> That's the strange thing - if you look at the log not only are they there,
> the standby has already retrieved them.
>
> It's then asking for the log again via the stream.
> --
> James Sewell,
> PostgreSQL Team Lead / Solutions Architect
>
>
>
> Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
> *P *(+61) 2 8099 9000 *W* www.jirotech.com *F *(+61) 2 8099 9099
>

--

------------------------------
The contents of this email are confidential and may be subject to legal or
professional privilege and copyright. No representation is made that this
email is free of viruses or other defects. If you have received this
communication in error, you may not copy or distribute any part of it or
otherwise disclose its contents to anyone. Please advise the sender of your
incorrect receipt of this correspondence.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message armand pirvu 2017-08-03 03:19:02 hot standby questions
Previous Message David G. Johnston 2017-08-03 00:21:26 Re: select md5 result set