Re: Recovery - New Slave PostgreSQL 9.2

From: "drum(dot)lucas(at)gmail(dot)com" <drum(dot)lucas(at)gmail(dot)com>
To: Rajesh Madiwale <rajeshmadiwale65(at)gmail(dot)com>
Cc: John Scalia <jayknowsunix(at)gmail(dot)com>, Shreeyansh Dba <shreeyansh2014(at)gmail(dot)com>, Ian Barwick <ian(at)2ndquadrant(dot)com>, "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Recovery - New Slave PostgreSQL 9.2
Date: 2016-01-14 00:10:36
Message-ID: CAE_gQfX3BaVHDkeFHAmufZEY-Uew7EJjMY_fnHxM1yK5qysFbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Problem solved.

I did the pg_basebackup again.

I got the replication from the Master and after that, I changed to the new
slave.
It's working now.

Thank you.

Lucas Possamai

kinghost.co.nz
<http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&u=2&sid=e999f8370385657a65d41d5ff60b0b38>

On 10 January 2016 at 19:01, drum(dot)lucas(at)gmail(dot)com <drum(dot)lucas(at)gmail(dot)com>
wrote:

> Not exactly certain what you're asking here, but have you tried pointing
>> you're second slave (the one that didn't fail) directly to your existing
>> primary, as opposed to cascading like now. Then, if that works (it may not
>> now as it's been too long out of sync.) just point the newly built slave at
>> the slave that's working.
>
>
> Now sure about what you have asked... but...
>
> *Following this example:*
>
> *NEW SCENARIO:*
>
> master1 -->slave1 -->slave2
>
> -->slave1 -->newslave (This is that one I'm setting up)
>
>
> The slave1 already gets the replication from the Master.
>
> ------------------------------------------------------------------------------
>
>
>
>> If .history file present in newstandby/pg_xlog directory then move
>> that file from it and also check same file in wal_archive and move from
>> there as well and try by restarting new standby
>
>
> I've done that already. And I've the same error:
> *2016-01-10 06:00:14.572 UTC|1793|FATAL: timeline 2 of the primary does
> not match recovery target timeline 8*
>
>
>
>
>
>
> Lucas Possamai
>
> kinghost.co.nz
> <http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&u=2&sid=e999f8370385657a65d41d5ff60b0b38>
>
> On 10 January 2016 at 13:54, Rajesh Madiwale <rajeshmadiwale65(at)gmail(dot)com>
> wrote:
>
>>
>> Hi Lucas,
>>
>> If .history file present in newstandby/pg_xlog directory then move
>> that file from it and also check same file in wal_archive and move from
>> there as well and try by restarting new standby
>>
>> Regards,
>> Rajesh.
>>
>>
>> ..On Sunday, January 10, 2016, drum(dot)lucas(at)gmail(dot)com <drum(dot)lucas(at)gmail(dot)com>
>> wrote:
>>
>>> Should I point of replication new slave to same DB?
>>>
>>> Lucas
>>>
>>> On Sunday, 10 January 2016, drum(dot)lucas(at)gmail(dot)com <drum(dot)lucas(at)gmail(dot)com>
>>> wrote:
>>>
>>>> John,
>>>>>
>>>>>
>>>>> I'd recommend that you'd specify -X s, as just specifying -X or
>>>>> -xiog gives you the default value of fetch rather than stream.
>>>>
>>>>
>>>> Sorry.. I've understood it wrong.
>>>> So you'd recommend to re-run the pg_basebackup
>>>> with --xlog-method=stream ?
>>>>
>>>> I'd hope that could find another way. As the pg_basebackup takes 30h to
>>>> complete :(
>>>>
>>>>
>>>>
>>>> Lucas Possamai
>>>>
>>>> kinghost.co.nz
>>>> <http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&u=2&sid=e999f8370385657a65d41d5ff60b0b38>
>>>>
>>>> On 10 January 2016 at 12:21, drum(dot)lucas(at)gmail(dot)com <drum(dot)lucas(at)gmail(dot)com
>>>> > wrote:
>>>>
>>>>> I'd recommend that you'd specify -X s, as just specifying -X or
>>>>>> -xiog gives you the default value of fetch rather than stream. Also,
>>>>>> from your current WAL directory listing that you just provided, that's
>>>>>> indicating that your server's timelines are far different.
>>>>>
>>>>>
>>>>> I don't think it's necessary to use -X - Check HERE
>>>>> <http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html>
>>>>>
>>>>> --xlog
>>>>>
>>>>> Using this option is equivalent of using -X with method fetch.
>>>>>
>>>>>
>>>>> -------------------------------
>>>>>
>>>>> Now, you're saying that one system went down, which is why you're
>>>>>> trying to do this, but was the first slave that failed? Or did your primary
>>>>>> fail? That would possibly explain why the timelines are different. If your
>>>>>> primary failed and this standby assumed command, then its timeline would
>>>>>> have incremented. So, if you're trying to put this one back as a slave,
>>>>>> that's not a really trivial process. You'd have to set the old primary back
>>>>>> up a slave to the current primary, and then execute another failover, this
>>>>>> time back to your original primary, and then rebuild all the slaves all
>>>>>> over.
>>>>>
>>>>>
>>>>> *PAST SCENARIO:*
>>>>> master1 -->slave1 -->slave2
>>>>> -->slave1 -->db-slave0 - *this one went down*
>>>>>
>>>>> *NEW SCENARIO:*
>>>>> master1 -->slave1 -->slave2
>>>>> -->slave1 -->newslave (This is that one I'm setting up)
>>>>>
>>>>>
>>>>>
>>>>> Lucas Possamai
>>>>>
>>>>> kinghost.co.nz
>>>>> <http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&u=2&sid=e999f8370385657a65d41d5ff60b0b38>
>>>>>
>>>>> On 10 January 2016 at 12:16, John Scalia <jayknowsunix(at)gmail(dot)com>
>>>>> wrote:
>>>>>
>>>>>> I'd recommend that you'd specify -X s, as just specifying -X or
>>>>>> -xiog gives you the default value of fetch rather than stream. Also,
>>>>>> from your current WAL directory listing that you just provided, that's
>>>>>> indicating that your server's timelines are far different.
>>>>>>
>>>>>> Now, you're saying that one system went down, which is why you're
>>>>>> trying to do this, but was the first slave that failed? Or did your primary
>>>>>> fail? That would possibly explain why the timelines are different. If your
>>>>>> primary failed and this standby assumed command, then its timeline would
>>>>>> have incremented. So, if you're trying to put this one back as a slave,
>>>>>> that's not a really trivial process. You'd have to set the old primary back
>>>>>> up a slave to the current primary, and then execute another failover, this
>>>>>> time back to your original primary, and then rebuild all the slaves all
>>>>>> over.
>>>>>>
>>>>>> Just saying,
>>>>>> Jay
>>>>>>
>>>>>> Sent from my iPad
>>>>>>
>>>>>> On Jan 9, 2016, at 3:48 PM, "drum(dot)lucas(at)gmail(dot)com" <
>>>>>> drum(dot)lucas(at)gmail(dot)com> wrote:
>>>>>>
>>>>>> Hi John,
>>>>>>
>>>>>> First, when you built the slave server, I'm assuming you used
>>>>>>> pg_basebackup and if you did, did you specify -X s in your command?
>>>>>>
>>>>>>
>>>>>> Yep. I ran the pg_basebackup into the new slave from ANOTHER SLAVE...
>>>>>> ssh postgres(at)slave1 'pg_basebackup --pgdata=- --format=tar
>>>>>> --label=bb_master --progress --host=localhost --port=5432
>>>>>> --username=replicator --xlog | pv --quiet --rate-limit 100M' | tar
>>>>>> -x --no-same-owner
>>>>>>
>>>>>> *-X = --xlog*
>>>>>>
>>>>>> On my new Slave, I've got all the wall archives. (The master copies
>>>>>> the wal at all the time...)
>>>>>> ls /var/lib/pgsql/9.2/wal_archive:
>>>>>> 0000000200000C6A0000002D
>>>>>> 0000000200000C6A0000002E
>>>>>>
>>>>>> and not
>>>>>> ../wal_archive/0000000400000C68000000C8` not found
>>>>>> ../wal_archive/00000005.history` not found
>>>>>>
>>>>>> Remember that I'm trying to do a cascading replication (It was
>>>>>> working with another slave. But the server went down and I'm trying to set
>>>>>> up a new one)
>>>>>>
>>>>>> I would suggest, in spite of of the 2TB size, rebuilding the standby
>>>>>>> servers with a proper pg_basebackup.
>>>>>>
>>>>>>
>>>>>> I've already ran the pg_basebackup over than once. And I always get
>>>>>> the same error... :(
>>>>>>
>>>>>> Is there anything else guys? please,, help hehehhe
>>>>>>
>>>>>>
>>>>>>
>>>>>> Lucas Possamai
>>>>>>
>>>>>> kinghost.co.nz
>>>>>> <http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&u=2&sid=e999f8370385657a65d41d5ff60b0b38>
>>>>>>
>>>>>> On 10 January 2016 at 10:33, John Scalia <jayknowsunix(at)gmail(dot)com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm a little late to this thread, but in looking at the errors you
>>>>>>> originally posted, two things come to mind:
>>>>>>>
>>>>>>> First, when you built the slave server, I'm assuming you used
>>>>>>> pg_basebackup and if you did, did you specify -X s in your command?
>>>>>>>
>>>>>>> Second, the missing history file isn't an issue, in case you're
>>>>>>> unfamiliar with this. However, yeah, the missing WAL segment is, as well as
>>>>>>> the bad timeline error. Is that missing segment still on your primary?
>>>>>>> You know you could just copy it manually to your standby and start from
>>>>>>> that. As far as the timeline error, that's disturbing to me as it's
>>>>>>> claiming the primary is actually a failed over standby. AFAIK, that's the
>>>>>>> main if not only way transaction timelines increment.
>>>>>>>
>>>>>>> I would suggest, in spite of of the 2TB size, rebuilding the standby
>>>>>>> servers with a proper pg_basebackup.
>>>>>>> --
>>>>>>> Jay
>>>>>>>
>>>>>>> Sent from my iPad
>>>>>>>
>>>>>>> On Jan 9, 2016, at 2:19 PM, "drum(dot)lucas(at)gmail(dot)com" <
>>>>>>> drum(dot)lucas(at)gmail(dot)com> wrote:
>>>>>>>
>>>>>>> Hi, thanks for your reply... I've been working on this problem for
>>>>>>> 20h =(
>>>>>>>
>>>>>>> *# cat postgresql.conf | grep synchronous_standby_names*
>>>>>>> #synchronous_standby_names = '' - It's commented
>>>>>>>
>>>>>>> *# cat postgresql.conf | grep application_name*
>>>>>>> log_line_prefix = '%m|%p|%q[%c](at)%r|%u|%a|%d '
>>>>>>> ( %a = application name )
>>>>>>>
>>>>>>> I can't resyc all the DB again, because it has 2TB of data :(
>>>>>>>
>>>>>>> Is there anything else I can do?
>>>>>>> Thank you
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Lucas Possamai
>>>>>>>
>>>>>>> kinghost.co.nz
>>>>>>> <http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&u=2&sid=e999f8370385657a65d41d5ff60b0b38>
>>>>>>>
>>>>>>> On 10 January 2016 at 04:22, Shreeyansh Dba <
>>>>>>> shreeyansh2014(at)gmail(dot)com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Jan 9, 2016 at 3:28 PM, drum(dot)lucas(at)gmail(dot)com <
>>>>>>>> drum(dot)lucas(at)gmail(dot)com> wrote:
>>>>>>>>
>>>>>>>>> My recovery was like that!
>>>>>>>>> I was already using that way.. I still have the problem =\
>>>>>>>>>
>>>>>>>>> Is there anything I can do?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Lucas Possamai
>>>>>>>>>
>>>>>>>>> kinghost.co.nz
>>>>>>>>> <http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&u=2&sid=e999f8370385657a65d41d5ff60b0b38>
>>>>>>>>>
>>>>>>>>> On 9 January 2016 at 22:53, Shreeyansh Dba <
>>>>>>>>> shreeyansh2014(at)gmail(dot)com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Lucas,
>>>>>>>>>>
>>>>>>>>>> Yes , now recovery.conf looks good.
>>>>>>>>>> Hope this solve you problem.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks and regards,
>>>>>>>>>> ShreeyanshDBA Team
>>>>>>>>>> Shreeyansh Technologies
>>>>>>>>>> www.shreeyansh.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, Jan 9, 2016 at 3:07 PM, drum(dot)lucas(at)gmail(dot)com <
>>>>>>>>>> drum(dot)lucas(at)gmail(dot)com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi there!
>>>>>>>>>>>
>>>>>>>>>>> Yep, it's correct:
>>>>>>>>>>> It looks like You have a set up A (Master) ---> B (Replica) --->
>>>>>>>>>>> C Replica (Base backup from Replica B)
>>>>>>>>>>>
>>>>>>>>>>> Master (A): 192.168.100.1
>>>>>>>>>>> Slave1 (B): 192.168.100.2
>>>>>>>>>>> Slave2 (C): 192.168.100.3
>>>>>>>>>>>
>>>>>>>>>>> My recovery.conf in slave2(C) is:
>>>>>>>>>>>
>>>>>>>>>>> restore_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/restore_wal_segment.bash "../wal_archive/%f" "%p"'
>>>>>>>>>>> archive_cleanup_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/pg_archivecleaup_mv.bash -d "../wal_archive" "%r"'
>>>>>>>>>>> recovery_target_timeline = 'latest'
>>>>>>>>>>> standby_mode = on
>>>>>>>>>>> primary_conninfo = 'host=192.168.100.2 port=5432 user=replicator application_name=replication_slave02'
>>>>>>>>>>>
>>>>>>>>>>> So, seems to be right to me... Is that u mean?
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Lucas Possamai
>>>>>>>>>>>
>>>>>>>>>>> kinghost.co.nz
>>>>>>>>>>> <http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&u=2&sid=e999f8370385657a65d41d5ff60b0b38>
>>>>>>>>>>>
>>>>>>>>>>> On 9 January 2016 at 22:25, Shreeyansh Dba <
>>>>>>>>>>> shreeyansh2014(at)gmail(dot)com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Jan 9, 2016 at 8:29 AM, drum(dot)lucas(at)gmail(dot)com <
>>>>>>>>>>>> drum(dot)lucas(at)gmail(dot)com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> ** NOTE: I ran the pg_basebackup from another STANDBY SERVER.
>>>>>>>>>>>>> Not from the MASTER*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Lucas Possamai
>>>>>>>>>>>>>
>>>>>>>>>>>>> kinghost.co.nz
>>>>>>>>>>>>> <http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&u=2&sid=e999f8370385657a65d41d5ff60b0b38>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 9 January 2016 at 15:28, drum(dot)lucas(at)gmail(dot)com <
>>>>>>>>>>>>> drum(dot)lucas(at)gmail(dot)com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Still trying to solve the problem...
>>>>>>>>>>>>>> Anyone can help please?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Lucas
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Lucas Possamai
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> kinghost.co.nz
>>>>>>>>>>>>>> <http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&u=2&sid=e999f8370385657a65d41d5ff60b0b38>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 9 January 2016 at 14:45, drum(dot)lucas(at)gmail(dot)com <
>>>>>>>>>>>>>> drum(dot)lucas(at)gmail(dot)com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sure... Here's the total information:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://superuser.com/questions/1023770/new-postgresql-slave-server-error-timeline
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> recovery.conf:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> restore_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/restore_wal_segment.bash "../wal_archive/%f" "%p"'
>>>>>>>>>>>>>>> archive_cleanup_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/pg_archivecleaup_mv.bash -d "../wal_archive" "%r"'
>>>>>>>>>>>>>>> recovery_target_timeline = 'latest'
>>>>>>>>>>>>>>> standby_mode = on
>>>>>>>>>>>>>>> primary_conninfo = 'host=192.168.100.XX port=5432 user=replicator application_name=replication_new_slave'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Lucas Possamai
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> kinghost.co.nz
>>>>>>>>>>>>>>> <http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&u=2&sid=e999f8370385657a65d41d5ff60b0b38>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 9 January 2016 at 14:37, Ian Barwick <ian(at)2ndquadrant(dot)com
>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 16/01/09 9:23, drum(dot)lucas(at)gmail(dot)com wrote:
>>>>>>>>>>>>>>>> > Hi all!
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > I've done the pg_basebackup from the live to a new slave
>>>>>>>>>>>>>>>> server...
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > I've recovery the wal files, but now that I configured to
>>>>>>>>>>>>>>>> replicate from the master (recovery.conf) I got this error:
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > ../wal_archive/0000000400000C68000000C8` not found
>>>>>>>>>>>>>>>> > ../wal_archive/00000005.history` not found
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > FATAL: timeline 2 of the primary does not match recovery
>>>>>>>>>>>>>>>> target timeline 1
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can you post the contents of your recovery.conf file,
>>>>>>>>>>>>>>>> suitably
>>>>>>>>>>>>>>>> anonymised if necessary?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ian Barwick
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Lucas,
>>>>>>>>>>>>
>>>>>>>>>>>> I followed your question I generated the same error:
>>>>>>>>>>>>
>>>>>>>>>>>> cp: cannot stat `/pgdata/arch/00000003.history': No such file
>>>>>>>>>>>> or directory
>>>>>>>>>>>> 2016-01-09 14:11:42 IST FATAL: timeline 1 of the primary does
>>>>>>>>>>>> not
>>>>>>>>>>>> match recovery target timeline 2
>>>>>>>>>>>>
>>>>>>>>>>>> It looks like You have a set up A (Master) ---> B (Replica)
>>>>>>>>>>>> ---> C Replica (Base backup from Replica B)
>>>>>>>>>>>>
>>>>>>>>>>>> It seems you have used recovery.conf (to replicate from master
>>>>>>>>>>>> to slave) to new replica setup C and there is high probability not changing
>>>>>>>>>>>> the primary connection info
>>>>>>>>>>>> in C's recovery.conf (Replica B's Connection info)
>>>>>>>>>>>>
>>>>>>>>>>>> During testing providing B's connection info in C's
>>>>>>>>>>>> recovery.conf resolved the issue.
>>>>>>>>>>>>
>>>>>>>>>>>> Please verify the Primary connection info parameter in
>>>>>>>>>>>> recovery.conf (C replica) might resolve your problem.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks and regards,
>>>>>>>>>>>> ShreeyanshDBA Team
>>>>>>>>>>>> Shreeyansh Technologies
>>>>>>>>>>>> www.shreeyansh.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>> Hi Lucas,
>>>>>>>>
>>>>>>>> It looks like application_name parameter that set in recovery.conf
>>>>>>>> may mismatch.
>>>>>>>> Please verify the value to synchronous_standby_names value set in
>>>>>>>> the postgresql.conf of Replica - C and the value that using as
>>>>>>>> application_name in recovery.conf
>>>>>>>>
>>>>>>>> Also, check whether the Async replication works with out using
>>>>>>>> application_name in recovery.conf of replica -C and check the status in
>>>>>>>> pg_stat_replication catalog table.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks and regards
>>>>>>>> ShreeyanshDBA Team
>>>>>>>> Shreeyansh Technologies
>>>>>>>> www.shreeyansh.com
>>>>>>>>
>>>>>>> Y
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>> --
>>>
>>>
>>> Lucas Possamai
>>>
>>> kinghost.co.nz
>>> <http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&u=2&sid=e999f8370385657a65d41d5ff60b0b38>
>>>
>>>
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Saulo Merlo 2016-01-14 21:47:25 Re: Query - Create PostgreSQL
Previous Message Saulo Merlo 2016-01-13 18:52:40 Re: Query - Create PostgreSQL