We've begun rolling 8.3 out to our remote sites and have already had
to do two recoveries using the PITR backup techniques. (Neither was
any fault of PostgreSQL. One server had a second drive in a RAID 5
array fail before the hot spare finished taking the place of the first
one. The other ran into problems running the scripts related to an
upgrade of our application software, and we decided to roll back and
leave them on the old release until we sorted out the problems.)
We've noticed a couple things which seem to merit comment.
In both cases it seemed to take longer to apply the WAL files during
the recovery process than we're used to from 8.2 I don't have hard
numbers, but before I spent time investigating I thought I would ask
here if that is already a known issue.
In the rollback from the bad application software update, we wanted to
restore up to the point where we shut down user access to the
database, and before we started running the update scripts. We had an
"at" job scheduled to kick this off at 17:00, so the recovery.conf
file had this line:
recovery_target_time = '2008-10-16 17:00:00.0'
The log shows this:
[2008-10-16 23:03:43.146 CDT] 19951 LOG: restored log file
"000000010000000100000056" from archive
[2008-10-16 23:04:26.005 CDT] 19951 LOG: recovery stopping before
abort of transaction 77627, time 2008-10-16 17:00:23.205347-05
[2008-10-16 23:04:26.006 CDT] 19951 LOG: redo done at 1/56FD4918
[2008-10-16 23:04:26.006 CDT] 19951 LOG: last completed transaction
was at log time 2008-10-16 17:00:23.205347-05
The line that gives me pause is "last completed transaction". Did it
really replay a transaction from from more than 23 seconds after the
recovery_target_time, or was that the first transaction it didn't
replay?
-Kevin