From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | PITR Phase 1 - Test results |
Date: | 2004-04-26 15:37:27 |
Message-ID: | 1082991844.3999.60.camel@stromboli |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I've now completed the coding of Phase 1 of PITR.
This allows a backup to be recovered and then rolled forward (all the
way) on transaction logs. This proves the code and the design works, but
also validates a lot of the earlier assumptions that were the subject of
much earlier debate.
As noted in the previous designs, PostgreSQL talks to an external
archiver using the XLogArchive API.
I've now completed:
- changes to PostgreSQL
- written a simple archiving utility, pg_arch
Using both of these together, I have successfully:
- started pg_arch
- started postgres
- taken a backup using tar
- ran pgbench for an extended period, so that the transaction logs taken
at the start have long since been recycled
- killed postmaster
- wait for completion
- rm -R $PGDATA
- restore using tar
- restore xlogs from archive directory
- start postmaster and watch it recover to end of logs
This has been tested through a number of times on non-trivial tests and
I've sat and watch the beast at work to make sure nothing wierd was
happening on timing.
At this stage:
Missing Functions -
- recovery does NOT yet stop at a specified point-in-time (that was
always planned for Phase 2)
- few more log messages required to report progress
- debug mode required to allow most to be turned off
Wrinkles
- code is system testable, but not as cute as it could be
- input from committers is now sought to complete the work
- you are strongly advised not to treat any of the patches as usable in
any real world situation YET - that bit comes next
Bugs
- two bugs currently occur during some tests:
1. the notification mechanism as originally designed causes ALL backends
to report that a log file has closed. That works most of the time,
though does give rise to occaisional timing errors - nothing too
serious, but this inexactness could lead to later errors.
2. After restore, the notification system doesn't recover fully - this
is a straightforward one
I'm building a full patchset for this code and will upload this soon. As
you might expect over the time its taken me to develop this, some bitrot
has set in, so I'm rebuilding it against the latest dev version now, and
will complete fixes for the two bugs mentioned above.
I'm sure some will say "no words, show me the code"... I thought you all
would appreciate some advance warning of this, to plan time to
investigate and comment upon the coding.
Best Regards, Simon Riggs, 2ndQuadrant
http://www.2ndquadrant.com
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2004-04-26 15:37:37 | Re: FW: getting a crash during initdb |
Previous Message | Alvaro Herrera | 2004-04-26 15:27:12 | Thread code not vpath-safe |