From: | "Gilberto Castillo" <gilberto(dot)castillo(at)etecsa(dot)cu> |
---|---|
To: | "John Scalia" <jayknowsunix(at)gmail(dot)com> |
Cc: | "Andrew Krause" <andrew(dot)krause(at)breakthroughfuel(dot)com>, "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org> |
Subject: | Re: Wal archive way behind in streaming replication |
Date: | 2014-06-25 17:42:15 |
Message-ID: | 43325.192.168.207.54.1403718135.squirrel@webmail.etecsa.cu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Stop to Master, copy folder pg_xlog you slave.
To me it has worked for me
>
> A little examination of the pgarch.c file showed what the archive process
> on the primary is doing. Anyway, to ensure that the primary knows that it
> has transmitted all the up to
> date WALs, I went into the primary's data/pg_xlog/archive_status directory
> and performed "touch 00000003000000900000036.ready" and repeated this
> command for the other WALs up to
> *44.ready. This really shouldn't have been a problem as the most recent
> WAL file in pg_xlog was *45. The archiver then picked up all those WAL
> files and transmitted them to the
> standbys. At least I saw them appear on the standby in the directory
> specified in the recovery.conf file.
>
> Now, what I really don't understand is the standby's behavior. After the
> WALs arrived, I saw nothing in today's pg_log/Wed.log file showing it saw
> them. I then issued a service
> postgresql-9.3 restart and this is what was spit out in the log:
>
> LOG: entering standby mode
> LOG: restored log file "00000000300000000900000035" from archive
> LOG: unexpected pageaddr 9/1B000000 in log segment
> 00000000300000000900000035, offset 0
> LOG: started streaming WAL from primary at 9/35000000 on timeline 3
> FATAL: the database system is starting up
> LOG: consistent recovery state reached at 9/350000C8
> LOG: redo starts at 9/350000C8
> LOG: database system is ready to accept read only connections
>
> Two things stand out here. First, the standby didn't seem to process the
> newly arrived WAL files, and second. what's with the FATAL: in the
> logfile?
> --
> Jay
>
> On 6/24/2014 2:52 PM, Andrew Krause wrote:
>> You shouldn’t have to touch the files as long as they aren’t compressed.
>> You may have to restart the standby instance to get the recovery to
>> begin though. I’d suggest tailing your instance log and restarting the
>> standby instance. It should show that the logs from the gap are
>> applying right away at startup.
>>
>>
>> Andrew Krause
>>
>>
>>
>>
>> On Jun 24, 2014, at 1:19 PM, John Scalia <jayknowsunix(at)gmail(dot)com> wrote:
>>
>>> Ok, I did the copy from pg_xlog directory into the restore.conf
>>> specifieddirectory. The standby servers seem fine with that, however,
>>> just copying does not inform the primary that
>>> the copy has happened. The archive_status directory under pg_xlog on
>>> the primary still thinks the last WAL sent was *B7 and yet it's now
>>> writing *C9. When I did the copy it was
>>> only up to *C7 and nothing else has shown in the standby's directory.
>>>
>>> Now, the *.done files in archive_status are just zero length, but I'm a
>>> bit hesitant to just do a touch for the ones I manually copied as I
>>> don't know if this is from an in-memory
>>> queue or if it Postgresql reads the contents of this regularly in order
>>> to decide what to copy.
>>>
>>> Is that safe to do?
>>>
>>> On 6/24/2014 9:56 AM, Andrew Krause wrote:
>>>> You can copy all of the WAL logs from your gap to the standby. If you
>>>> place them in the correct location (directory designated for restore)
>>>> theinstance will automatically apply them all.
>>>>
>>>>
>>>> Andrew Krause
>>>>
>>>>
>>>>
>>>> On Jun 23, 2014, at 9:24 AM, John Scalia <jayknowsunix(at)gmail(dot)com>
>>>> wrote:
>>>>
>>>>> Came in this morning to numerous complaints from pgpool about the
>>>>> standby servers being behind from the primary. Looking into it, no
>>>>> WAL files had been transferred since late Friday. All I did was
>>>>> restart the primaryand the WAL archving resumed, however, looking at
>>>>> the WAL files on the standby servers, this is never going to catch
>>>>> up. Now, I've got the archive_timeout on the primary = 600 or 10
>>>>> minutes and I see WAL files in pg_xlog every 10 minutes. As they show
>>>>> up on the standby servers, they're also 10 minutes apart, but the
>>>>> primary is writing *21 and the standby's areonly up to *10. Now, like
>>>>> I said prior, with there being 10 minutes (600seconds) between
>>>>> transfers (the same pace as the WALs are generated) it will never
>>>>> catch up. Is this really the intended behavior? How would I get the
>>>>> additional WAL files over to the standbys without waiting 10 minutes
>>>>> to copy them one at a time?
>>>>> --
>>>>> Jay
>>>>>
>>>>>
>>>>> --
>>>>> Sent via pgsql-admin mailing list (pgsql-admin(at)postgresql(dot)org)
>>>>> To make changes to your subscription:
>>>>> http://www.postgresql.org/mailpref/pgsql-admin
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Sent via pgsql-admin mailing list (pgsql-admin(at)postgresql(dot)org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-admin
>>
>
>
>
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin---
> This message was processed by Kaspersky Mail Gateway 5.6.28/RELEASE
> running at host imx2.etecsa.cu
> Visit our web-site: <http://www.kaspersky.com>, <http://www.viruslist.com>
>
Saludos,
Gilberto Castillo
La Habana, Cuba
Attachment | Content-Type | Size |
---|---|---|
unknown_filename | text/plain | 179 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | John Scalia | 2014-06-25 17:53:13 | Re: Wal archive way behind in streaming replication |
Previous Message | Jerry Sievers | 2014-06-25 16:42:29 | Re: Wal archive way behind in streaming replication |