Postgres 9.0.4 replication issue: FATAL: requested WAL segment 0000000100000B110000000D has already been removed

From: Ashish Gupta <ashish(dot)gupta(dot)cal(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Postgres 9.0.4 replication issue: FATAL: requested WAL segment 0000000100000B110000000D has already been removed
Date: 2011-11-19 09:44:37
Message-ID: CAH06V7NvSQX47+ZRKafMPX-OR0q5Zta9SotaA2RdiQ3-0pbytA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

Database streaming is not taking place. The WAL segment that slave is
looking for does not exist on Master.

Both Master and Slave are EC2 instances with Postgres version 9.0.04 and
Ubuntu 10.04. As per my understanding, DB replication was stalled for
around 3 months. On Master new 16 MB WAL is created in every 2-5 minutes.

For replication, I am following link:
http://wiki.postgresql.org/wiki/Streaming_Replication

I am also referring:
http://www.postgresql.org/docs/9.0/static/continuous-archiving.html
http://www.depesz.com/index.php/2010/03/11/setting-wal-replication/

Before starting backup, I ensured the following:
- On Slave I cleared contents of 'pg_xlog/*'.
- Both master and Slave have following in postgresql.conf:
wal_level = archive
hot_standby = off
- In postgresql.conf master has:
max_wal_senders = 5
wal_keep_segments = 10
- On slave recovery.conf has following 3 parameters:
standby_mode = 'on'
primary_conninfo = 'host=10.218.61.143 port=5432 user=postgres'
trigger_file = '/data/db/trigger_failover'

I used following commands for backup. And as soon as backup finished, I
immediately started postgres on Slave.
psql -c "SELECT pg_start_backup('label', true)";
rsync -av --progress /data/db/main/ 10.40.89.9:/data/db/main/ --exclude
'pg_log/*' --exclude 'pg_xlog/*' --exclude postmaster.pid --exclude
pg_hba.conf --exclude postgresql.conf;
psql -c "SELECT pg_stop_backup()";

On Slave I see following process running:
$ ps -ef | grep postgres
postgres 1895 1 0 Nov18 ? 00:00:00
/usr/lib/postgresql/9.0/bin/postgres -D /data/db/main -c
config_file=/etc/postgresql/9.0/main/postgresql.conf
postgres 1896 1895 0 Nov18 ? 00:00:00 postgres: startup process
waiting for 0000000100000B110000000D

On Slave, log showd that it is unable to find the requested WAL segment
$ tail /var/log/postgresql/postgresql-9.0-main.log
2011-11-19 07:09:50 UTC LOG: streaming replication successfully connected
to primary
2011-11-19 07:09:50 UTC FATAL: could not receive data from WAL stream:
FATAL: requested WAL segment 0000000100000B110000000D has already been
removed

I confirmed that requested WAL segment 0000000100000B110000000D doesn't
exist on Master.

On Master, process listing shows:
$ ps -ef | grep postgres
postgres 25395 25389 0 Nov14 ? 00:00:06 postgres: archiver
process last was 0000000100000B1F00000081

Log on master also indicate that requested WAL segment was removed:
$ tail postgresql-2011-11-18_221110.csv
2011-11-18 23:15:01.355
PST,"postgres","",20523,"10.40.89.9:46157",4ec75775.502b,1,"authentication",2011-11-18
23:15:01 PST,5/703238,0,LOG,00000,"replication connection authorized:
user=postgres host=10.40.89.9 port=46157",,,,,,,,,""
2011-11-18 23:15:01.356
PST,"postgres","",20523,"10.40.89.9:46157",4ec75775.502b,2,"startup",2011-11-18
23:15:01 PST,5/0,0,FATAL,58P01,"requested WAL segment
0000000100000B110000000D has already been removed",,,,,,,,,""

On Slave, I even tried deleting everything under /data/db/main, and took
backup again but the issue still persists.

It seems it is not an issue because slow Slave is not able to catch to
master. Because,
1) This happens as soon as Slave DB is started. So slave doesn't even get
the first WAL file.
2) Both machines are in same zone of EC2 and backup happens at fairly good
speed. So network connectivity issues are also ruled out.

I searched on various forums, where people encountered similar error,
however in all such issues WAL file existed on Master. In this case Master
is not retaining the WAL file required by the Slave.

I am unable to understand as to why Master is not retaining the WAL files.
Any pointer/suggestions would be helpful.
Thanks for attention.

Ashish

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Raymond O'Donnell 2011-11-19 10:23:51 Re: How to install latest stable postgresql on Debian
Previous Message Phoenix Kiula 2011-11-19 08:20:07 Re: Installed. Now what?