Re: PITR

From: Murthy Nunna <mnunna(at)fnal(dot)gov>
To: desmodemone <desmodemone(at)gmail(dot)com>, Raghavendra <raghavendra(dot)rao(at)enterprisedb(dot)com>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: PITR
Date: 2014-02-23 15:20:37
Message-ID: BD7563973EF8894D905C80593EE9A4ED3312DD@MAIL01.fnal.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Raghavendra,

Thanks for testing and confirming the behavior of "pause" setting.

While I understand your explanation, I feel I am still missing something. IMHO, when I say pause using "pause" setting, no matter what, I expect the recovery to wait for manual intervention. I myself can come up with number of reasons for doing so... e.g I may be purposely "hiding" some WALs somewhere else, or maybe I have several thousands of WALs that I want to parallelize the process of applying some logs while I recall some from tapes.

Let me know what you think.

Mat, This is a test database so I purposely lowered checkpoint/archive timeouts. Thank-you, I'll follow your advice for production systems to conserve space.

Thanks,
Murthy

From: desmodemone [mailto:desmodemone(at)gmail(dot)com]
Sent: Sunday, February 23, 2014 4:13 AM
To: Raghavendra
Cc: Murthy Nunna; pgsql-admin(at)postgresql(dot)org
Subject: Re: [ADMIN] PITR

2014-02-23 7:08 GMT+01:00 Raghavendra <raghavendra(dot)rao(at)enterprisedb(dot)com<mailto:raghavendra(dot)rao(at)enterprisedb(dot)com>>:
On Sun, Feb 23, 2014 at 9:57 AM, Murthy Nunna <mnunna(at)fnal(dot)gov<mailto:mnunna(at)fnal(dot)gov>> wrote:
Hi Mat,

Thank you for the pointers on pause_at_recovery_target and recovery_target_time. It worked but I encountered an unexpected situation.

I wanted to test three recovery times by checking data at each point and then proceed to the next. It worked as expected first 2 recovery times but the last one did not give me an opportunity to check data. It simply completed recovery and switched timeline. This means I cannot rollforward anymore unless I restore the database again. Do you think I did something wrong?

You did right, for the first two time targets you have some wal files pending hence pause is allowed. Whereas for the last time target there are no archive files to pause so it has just opened the database as completion of PITR.

I am able to reproduce this on my local as well.

This is from my logs:

2014-02-06 18:54:54 PST-11634---[] LOG: restored log file "0000000100000001000000B8" from archive
2014-02-06 18:54:55 PST-11634---[] LOG: restored log file "0000000100000001000000B9" from archive
2014-02-06 18:54:55 PST-11634---[] LOG: consistent recovery state reached at 1/B90029C8
2014-02-06 18:54:55 PST-11632---[] LOG: database system is ready to accept read only connections
cp: cannot stat `/opt/PostgreSQL/9.3/archives93/0000000100000001000000BA': No such file or directory

B8,B9 applied and looking for next file BA which is not there and no point of pausing without any files, hence it has opened the database. But my guess is, what it has done is right, there no more files to pause and allow you to query, its like completion of PITR.

Same in your case, there may not be any files in your archives directory. You can check for '00000004000000080000000D' file in your archive directory.

,2014-02-22 22:04:16 CSTLOG: restored log file "00000004000000080000000C" from archive
cp: cannot stat `/pgdata/backups/xlogs/minerva_ecl_test/00000004000000080000000D': No such file or directory

---
Regards,
Raghavendra
EnterpriseDB Corporation
Blog: http://raghavt.blogspot.com/

Raghavendra is right, the recovery applied successful your archived wal segment to the database and ended the loop in the xlog.c when it not found anymore records in the wal segments. Moreover the backend is telling you that the recovery phase ended at "2014-02-22 19:48:29.205898-06" while your target timeline was "2014-02-22 19:50:00" and it's interesting.
Did you create transactions and you committed them at that time or after ?
Normally, when you have to do a restore, probably you had a crash and your filesystem is not ok, so you use a new filesystem on another storage. So your last transactions are "lost", because those transactions was not still archived, infact those transactions were in the last wal segment of the crashed filesystem. If that file is not OK, you could not use it for complete the recovery.

If you have a RPO policy , you have to look at parameter archive_timeout , so you are sure your wal segments will be archived, and if you do that and your database does not create so much transactions , think about to compress them or use data deduplication at storage level or filesystem level , or you will have a lot of wasted space ( the wal segment even if is not full will write 16Mb of <data>+<00000..000>)

Have a nice day

Mat Dba

In response to

  • Re: PITR at 2014-02-23 10:13:01 from desmodemone

Responses

  • Re: PITR at 2014-02-23 18:06:59 from bricklen
  • Re: PITR at 2014-02-24 12:30:55 from Raghavendra

Browse pgsql-admin by date

  From Date Subject
Next Message bricklen 2014-02-23 18:06:59 Re: PITR
Previous Message desmodemone 2014-02-23 10:13:01 Re: PITR