Re: Recovery - New Slave PostgreSQL 9.2

From: "drum(dot)lucas(at)gmail(dot)com" <drum(dot)lucas(at)gmail(dot)com>
To: John Scalia <jayknowsunix(at)gmail(dot)com>
Cc: Shreeyansh Dba <shreeyansh2014(at)gmail(dot)com>, Ian Barwick <ian(at)2ndquadrant(dot)com>, "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Recovery - New Slave PostgreSQL 9.2
Date: 2016-01-09 23:21:07
Message-ID: CAE_gQfUGC=vRY--pZ45C2UScy6rt+nR45m0MjtisaKoPe+VWtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

>
> I'd recommend that you'd specify -X s, as just specifying -X or
> -xiog gives you the default value of fetch rather than stream. Also, from
> your current WAL directory listing that you just provided, that's
> indicating that your server's timelines are far different.

I don't think it's necessary to use -X - Check HERE
<http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html>

--xlog

Using this option is equivalent of using -X with method fetch.

-------------------------------

Now, you're saying that one system went down, which is why you're trying to
> do this, but was the first slave that failed? Or did your primary fail?
> That would possibly explain why the timelines are different. If your
> primary failed and this standby assumed command, then its timeline would
> have incremented. So, if you're trying to put this one back as a slave,
> that's not a really trivial process. You'd have to set the old primary back
> up a slave to the current primary, and then execute another failover, this
> time back to your original primary, and then rebuild all the slaves all
> over.

*PAST SCENARIO:*
master1 -->slave1 -->slave2
-->slave1 -->db-slave0 - *this one went down*

*NEW SCENARIO:*
master1 -->slave1 -->slave2
-->slave1 -->newslave (This is that one I'm setting up)

Lucas Possamai

kinghost.co.nz
<http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&u=2&sid=e999f8370385657a65d41d5ff60b0b38>

On 10 January 2016 at 12:16, John Scalia <jayknowsunix(at)gmail(dot)com> wrote:

> I'd recommend that you'd specify -X s, as just specifying -X or
> -xiog gives you the default value of fetch rather than stream. Also, from
> your current WAL directory listing that you just provided, that's
> indicating that your server's timelines are far different.
>
> Now, you're saying that one system went down, which is why you're trying
> to do this, but was the first slave that failed? Or did your primary fail?
> That would possibly explain why the timelines are different. If your
> primary failed and this standby assumed command, then its timeline would
> have incremented. So, if you're trying to put this one back as a slave,
> that's not a really trivial process. You'd have to set the old primary back
> up a slave to the current primary, and then execute another failover, this
> time back to your original primary, and then rebuild all the slaves all
> over.
>
> Just saying,
> Jay
>
> Sent from my iPad
>
> On Jan 9, 2016, at 3:48 PM, "drum(dot)lucas(at)gmail(dot)com" <drum(dot)lucas(at)gmail(dot)com>
> wrote:
>
> Hi John,
>
> First, when you built the slave server, I'm assuming you used
>> pg_basebackup and if you did, did you specify -X s in your command?
>
>
> Yep. I ran the pg_basebackup into the new slave from ANOTHER SLAVE...
> ssh postgres(at)slave1 'pg_basebackup --pgdata=- --format=tar
> --label=bb_master --progress --host=localhost --port=5432
> --username=replicator --xlog | pv --quiet --rate-limit 100M' | tar -x
> --no-same-owner
>
> *-X = --xlog*
>
> On my new Slave, I've got all the wall archives. (The master copies the
> wal at all the time...)
> ls /var/lib/pgsql/9.2/wal_archive:
> 0000000200000C6A0000002D
> 0000000200000C6A0000002E
>
> and not
> ../wal_archive/0000000400000C68000000C8` not found
> ../wal_archive/00000005.history` not found
>
> Remember that I'm trying to do a cascading replication (It was working
> with another slave. But the server went down and I'm trying to set up a new
> one)
>
> I would suggest, in spite of of the 2TB size, rebuilding the standby
>> servers with a proper pg_basebackup.
>
>
> I've already ran the pg_basebackup over than once. And I always get the
> same error... :(
>
> Is there anything else guys? please,, help hehehhe
>
>
>
> Lucas Possamai
>
> kinghost.co.nz
> <http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&u=2&sid=e999f8370385657a65d41d5ff60b0b38>
>
> On 10 January 2016 at 10:33, John Scalia <jayknowsunix(at)gmail(dot)com> wrote:
>
>> Hi,
>>
>> I'm a little late to this thread, but in looking at the errors you
>> originally posted, two things come to mind:
>>
>> First, when you built the slave server, I'm assuming you used
>> pg_basebackup and if you did, did you specify -X s in your command?
>>
>> Second, the missing history file isn't an issue, in case you're
>> unfamiliar with this. However, yeah, the missing WAL segment is, as well as
>> the bad timeline error. Is that missing segment still on your primary?
>> You know you could just copy it manually to your standby and start from
>> that. As far as the timeline error, that's disturbing to me as it's
>> claiming the primary is actually a failed over standby. AFAIK, that's the
>> main if not only way transaction timelines increment.
>>
>> I would suggest, in spite of of the 2TB size, rebuilding the standby
>> servers with a proper pg_basebackup.
>> --
>> Jay
>>
>> Sent from my iPad
>>
>> On Jan 9, 2016, at 2:19 PM, "drum(dot)lucas(at)gmail(dot)com" <drum(dot)lucas(at)gmail(dot)com>
>> wrote:
>>
>> Hi, thanks for your reply... I've been working on this problem for 20h =(
>>
>> *# cat postgresql.conf | grep synchronous_standby_names*
>> #synchronous_standby_names = '' - It's commented
>>
>> *# cat postgresql.conf | grep application_name*
>> log_line_prefix = '%m|%p|%q[%c](at)%r|%u|%a|%d '
>> ( %a = application name )
>>
>> I can't resyc all the DB again, because it has 2TB of data :(
>>
>> Is there anything else I can do?
>> Thank you
>>
>>
>>
>> Lucas Possamai
>>
>> kinghost.co.nz
>> <http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&u=2&sid=e999f8370385657a65d41d5ff60b0b38>
>>
>> On 10 January 2016 at 04:22, Shreeyansh Dba <shreeyansh2014(at)gmail(dot)com>
>> wrote:
>>
>>>
>>>
>>> On Sat, Jan 9, 2016 at 3:28 PM, drum(dot)lucas(at)gmail(dot)com <
>>> drum(dot)lucas(at)gmail(dot)com> wrote:
>>>
>>>> My recovery was like that!
>>>> I was already using that way.. I still have the problem =\
>>>>
>>>> Is there anything I can do?
>>>>
>>>>
>>>>
>>>> Lucas Possamai
>>>>
>>>> kinghost.co.nz
>>>> <http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&u=2&sid=e999f8370385657a65d41d5ff60b0b38>
>>>>
>>>> On 9 January 2016 at 22:53, Shreeyansh Dba <shreeyansh2014(at)gmail(dot)com>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi Lucas,
>>>>>
>>>>> Yes , now recovery.conf looks good.
>>>>> Hope this solve you problem.
>>>>>
>>>>>
>>>>> Thanks and regards,
>>>>> ShreeyanshDBA Team
>>>>> Shreeyansh Technologies
>>>>> www.shreeyansh.com
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Jan 9, 2016 at 3:07 PM, drum(dot)lucas(at)gmail(dot)com <
>>>>> drum(dot)lucas(at)gmail(dot)com> wrote:
>>>>>
>>>>>> Hi there!
>>>>>>
>>>>>> Yep, it's correct:
>>>>>> It looks like You have a set up A (Master) ---> B (Replica) ---> C
>>>>>> Replica (Base backup from Replica B)
>>>>>>
>>>>>> Master (A): 192.168.100.1
>>>>>> Slave1 (B): 192.168.100.2
>>>>>> Slave2 (C): 192.168.100.3
>>>>>>
>>>>>> My recovery.conf in slave2(C) is:
>>>>>>
>>>>>> restore_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/restore_wal_segment.bash "../wal_archive/%f" "%p"'
>>>>>> archive_cleanup_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/pg_archivecleaup_mv.bash -d "../wal_archive" "%r"'
>>>>>> recovery_target_timeline = 'latest'
>>>>>> standby_mode = on
>>>>>> primary_conninfo = 'host=192.168.100.2 port=5432 user=replicator application_name=replication_slave02'
>>>>>>
>>>>>> So, seems to be right to me... Is that u mean?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>> Lucas Possamai
>>>>>>
>>>>>> kinghost.co.nz
>>>>>> <http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&u=2&sid=e999f8370385657a65d41d5ff60b0b38>
>>>>>>
>>>>>> On 9 January 2016 at 22:25, Shreeyansh Dba <shreeyansh2014(at)gmail(dot)com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Sat, Jan 9, 2016 at 8:29 AM, drum(dot)lucas(at)gmail(dot)com <
>>>>>>> drum(dot)lucas(at)gmail(dot)com> wrote:
>>>>>>>
>>>>>>>> ** NOTE: I ran the pg_basebackup from another STANDBY SERVER. Not
>>>>>>>> from the MASTER*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Lucas Possamai
>>>>>>>>
>>>>>>>> kinghost.co.nz
>>>>>>>> <http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&u=2&sid=e999f8370385657a65d41d5ff60b0b38>
>>>>>>>>
>>>>>>>> On 9 January 2016 at 15:28, drum(dot)lucas(at)gmail(dot)com <
>>>>>>>> drum(dot)lucas(at)gmail(dot)com> wrote:
>>>>>>>>
>>>>>>>>> Still trying to solve the problem...
>>>>>>>>> Anyone can help please?
>>>>>>>>>
>>>>>>>>> Lucas
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Lucas Possamai
>>>>>>>>>
>>>>>>>>> kinghost.co.nz
>>>>>>>>> <http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&u=2&sid=e999f8370385657a65d41d5ff60b0b38>
>>>>>>>>>
>>>>>>>>> On 9 January 2016 at 14:45, drum(dot)lucas(at)gmail(dot)com <
>>>>>>>>> drum(dot)lucas(at)gmail(dot)com> wrote:
>>>>>>>>>
>>>>>>>>>> Sure... Here's the total information:
>>>>>>>>>>
>>>>>>>>>> http://superuser.com/questions/1023770/new-postgresql-slave-server-error-timeline
>>>>>>>>>>
>>>>>>>>>> recovery.conf:
>>>>>>>>>>
>>>>>>>>>> restore_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/restore_wal_segment.bash "../wal_archive/%f" "%p"'
>>>>>>>>>> archive_cleanup_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/pg_archivecleaup_mv.bash -d "../wal_archive" "%r"'
>>>>>>>>>> recovery_target_timeline = 'latest'
>>>>>>>>>> standby_mode = on
>>>>>>>>>> primary_conninfo = 'host=192.168.100.XX port=5432 user=replicator application_name=replication_new_slave'
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Lucas Possamai
>>>>>>>>>>
>>>>>>>>>> kinghost.co.nz
>>>>>>>>>> <http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&u=2&sid=e999f8370385657a65d41d5ff60b0b38>
>>>>>>>>>>
>>>>>>>>>> On 9 January 2016 at 14:37, Ian Barwick <ian(at)2ndquadrant(dot)com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> On 16/01/09 9:23, drum(dot)lucas(at)gmail(dot)com wrote:
>>>>>>>>>>> > Hi all!
>>>>>>>>>>> >
>>>>>>>>>>> > I've done the pg_basebackup from the live to a new slave
>>>>>>>>>>> server...
>>>>>>>>>>> >
>>>>>>>>>>> > I've recovery the wal files, but now that I configured to
>>>>>>>>>>> replicate from the master (recovery.conf) I got this error:
>>>>>>>>>>> >
>>>>>>>>>>> > ../wal_archive/0000000400000C68000000C8` not found
>>>>>>>>>>> > ../wal_archive/00000005.history` not found
>>>>>>>>>>> >
>>>>>>>>>>> > FATAL: timeline 2 of the primary does not match recovery
>>>>>>>>>>> target timeline 1
>>>>>>>>>>>
>>>>>>>>>>> Can you post the contents of your recovery.conf file, suitably
>>>>>>>>>>> anonymised if necessary?
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>>
>>>>>>>>>>> Ian Barwick
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>> Hi Lucas,
>>>>>>>
>>>>>>> I followed your question I generated the same error:
>>>>>>>
>>>>>>> cp: cannot stat `/pgdata/arch/00000003.history': No such file or
>>>>>>> directory
>>>>>>> 2016-01-09 14:11:42 IST FATAL: timeline 1 of the primary does not
>>>>>>> match recovery target timeline 2
>>>>>>>
>>>>>>> It looks like You have a set up A (Master) ---> B (Replica) ---> C
>>>>>>> Replica (Base backup from Replica B)
>>>>>>>
>>>>>>> It seems you have used recovery.conf (to replicate from master to
>>>>>>> slave) to new replica setup C and there is high probability not changing
>>>>>>> the primary connection info
>>>>>>> in C's recovery.conf (Replica B's Connection info)
>>>>>>>
>>>>>>> During testing providing B's connection info in C's recovery.conf
>>>>>>> resolved the issue.
>>>>>>>
>>>>>>> Please verify the Primary connection info parameter in recovery.conf
>>>>>>> (C replica) might resolve your problem.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks and regards,
>>>>>>> ShreeyanshDBA Team
>>>>>>> Shreeyansh Technologies
>>>>>>> www.shreeyansh.com
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> Hi Lucas,
>>>
>>> It looks like application_name parameter that set in recovery.conf may
>>> mismatch.
>>> Please verify the value to synchronous_standby_names value set in the
>>> postgresql.conf of Replica - C and the value that using as application_name
>>> in recovery.conf
>>>
>>> Also, check whether the Async replication works with out using
>>> application_name in recovery.conf of replica -C and check the status in
>>> pg_stat_replication catalog table.
>>>
>>>
>>> Thanks and regards
>>> ShreeyanshDBA Team
>>> Shreeyansh Technologies
>>> www.shreeyansh.com
>>>
>>
>>
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message drum.lucas@gmail.com 2016-01-09 23:29:28 Re: Recovery - New Slave PostgreSQL 9.2
Previous Message John Scalia 2016-01-09 23:16:21 Re: Recovery - New Slave PostgreSQL 9.2