The Problem of Applying Point-in-time Recovery

From: Shih Théo <galaxyshih(at)gmail(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: The Problem of Applying Point-in-time Recovery
Date: 2015-06-20 10:11:56
Message-ID: CA+f59JnKdy0gxodzE8F5W=jM5FDQAR9iDzm70XUBgNygjw48Bw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Dear Sir/Madam,

I am not sure if is it proper to post my problem here. If not, please
forgive my ignorance and tell me where should I post to.

Recently, I am applying point-in-time recovery with Debian and postgres 8.3
(due to some reason, I have no chance to upgrade) but I encountered some
problems. I followed the instruction from the official document step by
step.

1. First, I modified the postgres.conf to enable WAL arching and restart
the postgres.
2. Then I simply tar the whole data in the cluster data directory,
${PG_DATA} to be the base backup. During this step, I called
pg_start_backup('label') and pg_stop_backup() before and after the tar
procedure separately.
3. After that, I inserted some data into the database.
4. Next, I simulated that the database is corrupted and need to perform
recover.
4.0. stop postgres
4.1. moved the WALs from ${PG_DATA}/pg_xlog to another directory
4.2. untared the base backup and moved the data to ${PG_DATA}
(overwrite it)
4.3. created recovery.conf, following is my configuration: (Note, I
stored the WALs to a remote host)
restore_command = 'rsync -a
host_user(at)host_ip:/path/to/remote/host/wal/%f
%p'
recovery_target_time = 'YYYY-mm-dd HH:MM:SS'
recovery_target_timeline = 'value'
4.4. restarted postgres

At first, everything was fine. I could perform recover successfully. I
could see from log that postgres did restore the WALs and I could see the
data which i inserted in step 3 in database, too. But when I performed
recover repeatedly (that is I repeatedly performed from step 4.0 to step
4.4). I got very high possibility that postgres could fail to recover. Here
is the error message:

2015-06-18 20:22:02 GMT+8 LOG: restored log file
"00000001000000000000002E.00000020.backup" from archive
2015-06-18 20:22:03 GMT+8 LOG: restored log file
"00000001000000000000002E" from archive
2015-06-18 20:22:03 GMT+8 LOG: unexpected pageaddr 0/2A000000 in log file
0, segment 46, offset 0
2015-06-18 20:22:03 GMT+8 LOG: invalid checkpoint record
2015-06-18 20:22:03 GMT+8 FATAL: could not locate required checkpoint
record
2015-06-18 20:22:03 GMT+8 HINT: If you are not restoring from a backup,
try removing the file "/home/genie/db_mount_point/backup_label".
2015-06-18 20:22:03 GMT+8 LOG: startup process (PID 658) exited with exit
code 1
2015-06-18 20:22:03 GMT+8 LOG: aborting startup due to startup process
failure

I do not know what caused the problem exactly. Is the problem happened
because I performed recover repeatedly? Please give me some suggestion.

Yours faithfully

Browse pgsql-admin by date

  From Date Subject
Next Message Eoin O'Hehir 2015-06-20 20:02:18 Does VACUUM FULL need ANALYZE ?
Previous Message Albe Laurenz 2015-06-19 11:03:06 Re: database server crash und how can i check all tables