From: | Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com> |
---|---|
To: | hlinnakangas(at)vmware(dot)com |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Review of pg_rewind |
Date: | 2013-10-23 09:07:43 |
Message-ID: | CAF8Q-Gw1HBKzpSEVtotLg=DR+Ee-6q59qQfhY5tor3FYAenyrA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
While testing pg_rewind I encountered following problem.
I used following process to do the testing, Please correct me if I am doing
it in wrong way.
Problem-1:
pg_rewind gives error (target master must be shut down cleanly.) when
master crashed unexpectedly.
1. Setup Streaming Replication (stand alone machine : master server port
-5432, standby server port-5433 )
2. Do some operation on master server:
postgres=# create table test(id int);
3. Crash the Postgres process of master:
kill -9 [pid of postgres process of master server]
4. Promote standby server
5. Run pg_rewind:
$ /samrat/postgresql/contrib/pg_rewind/pg_rewind -D
/samrat/master-data/ --source-server='host=localhost port=5433
dbname=postgres' -v
connected to remote server
fetched file "global/pg_control", length 8192
target master must be shut down cleanly.
6. Check masters control information:
$ /samrat/postgresql/install/bin/pg_controldata
/samrat/master-data/ | grep "Database cluster state"
Database cluster state: in production
IIUC It is because pg_rewind does some checks before resynchronizing the
PostgreSQL data directories.
But In real time scenarios, for example due to hardware failure if master
crashed and its controldata shows the state "in production" then pg_rewind
will fail to pass this check.
Problem-2:
For zero length WAL record pf_rewind gives error.
1. Setup Streaming Replication (stand alone machine : master server port
-5432, standby server port-5433 )
2. Cleanly shutdown master (Do not add any data on master)
3. Promote standby server
4. Create table on new master (promoted standby)
postgres=# create table test(id int);
5. Run pg_rewind:
$ /samrat/postgresql/contrib/pg_rewind/pg_rewind -D
/samrat/master-data/ --source-server='host=localhost port=5433
connected to remote server
connected to remote server
fetched file "global/pg_control", length 8192
fetched file "pg_xlog/00000002.history", length 41
Last common WAL position: 0/4000090 on timeline 1
could not previous WAL record at 0/4000090: record with zero
length at 0/4000090
Also it as you already listed in README of pg_rewind the it has a problem
of tablespace support.
I will continue with testing it further to help in improving it :)
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2013-10-23 09:24:45 | Re: Review of pg_rewind |
Previous Message | Florian Weimer | 2013-10-23 07:02:30 | Re: Reasons not to like asprintf |