Re: Trouble with replication

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: David Greco <David_Greco(at)harte-hanks(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Trouble with replication
Date: 2013-06-06 16:51:58
Message-ID: CAMkU=1zT2Rctrw3cGW1GC5aeVSOcNwTLh9iPbP-nVmewXssyfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Jun 5, 2013 at 1:39 PM, David Greco <David_Greco(at)harte-hanks(dot)com>wrote:

> I’ve setup two 9.2.4 servers to serve as master-slave in a streaming
> replication scenario. I started with a fresh database on the master, setup
> the replication, then imported using pg_restore about 30GB of data. The
> master and slave are geographically separated, so replication of this
> amount of data can/should take hours. I saw from
> pg_last_xlog_receive_location and pg_last_xlog_replay_location that the
> slave began to receive the replication information, it eventually quit with
> the following errors in the log:****
>
> ** **
>
> 2013-06-05 16:28:43.198 EDT,,,19978,,51af9f7a.4e0a,2,,2013-06-05 16:28:42
> EDT,,0,FATAL,XX000,"could not receive data from WAL stream: FATAL:
> requested WAL segment 000000010000000000000022 has already been removed***
> *
>
> ",,,,,,,,,""
>

What are the messages before and after this?

> ****
>
> ** **
>
> Checking the master, I see that file has in fact been removed from the
> pg_xlog directory. The master has archive_command setup to ship the wal
> files to the slave, and the slave is setup with a recovery_command to read
> them from that directory.
>

Are you sure that these are set up correctly? What happens if you comment
out primary_conninfo, so that the archive directory is the only way to
deliver the files?

In fact, that WAL segment exists in the slave’s pg_xlog directory as well.
>

But is the existing file identical to the one the master (and the one in
the archivedir)? It is probably a recycled file that has not yet been
overwritten with received contents. That is, it has the contents of some
past log file, but the name of some future one.

> **
>
> Now, from what I can tell, the master archived this wal file out of its
> xlog directory (based on the keep wal segments setting). Then, why did the
> slave not pick it up from the directory that it was archived to? It is my
> understanding that the log shipping via archive_command from master to
> slave is precisely there to prevent this scenario. What am I doing wrong?
> Below are some of the pertinent settings.
>

In my hands, this is what happens. After losing contact with the primary,
it starts pulling files from the archive until it runs out of those, then
tries to reconnect to the primary.

Cheers,

Jeff

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ray Cote 2013-06-06 16:53:17 Database performs massive reads when I'm doing writes.
Previous Message François Beausoleil 2013-06-06 16:37:29 Slave promotion failure