From: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | Rick Gigger <rick(at)alpinenetworking(dot)com> |
Cc: | Richard Huxton <dev(at)archonet(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, pgsql general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: incremental backups |
Date: | 2006-02-24 14:12:12 |
Message-ID: | 200602241412.k1OECC121000@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I have applied the following patch adds to the paragraph after the one
you quoted below. I just added mention that the start/stop time _and_
wal file names are in the history file.
---------------------------------------------------------------------------
Rick Gigger wrote:
> I've started writing some scripts to set up incremental backup to my
> taste. I just discovered something and thought I would revisit this
> thread briefly.
>
> When you go to restore from a give base file system backup you need
> to know the start WAL file that you need and the end WAL file that
> you need. (You will most likely have many files beyond the "stop"
> file but you must have at least up to the "stop" file for the restore
> to work.
>
> Now if you try to restore but you don't have the "stop" WAL file
> postges will die on recovery and tell you that it can't recover
> forward far enough to make the backup consistent. But I wanted to
> know the easiest way to verify if you indeed had the necessary files
> without having to actually do a restore and have postgres tell you if
> it succeeded or not.
>
> Perhaps no one understood me because the answer I was looking for was
> too obvious. But what I really wanted to know was how do you know
> what the "stop" file is. It informs you of the start file all over
> the place when doing the base backups but I thought I would have to
> do something clever to figure out the stop file on my own. But
> luckily I don't. The backup history file has too lines like this:
>
> START WAL LOCATION: 0/88F21D0C (file 000000010000000000000088)
> STOP WAL LOCATION: 0/88F21D50 (file 000000010000000000000088)
>
> It was clear to me from the docs how to figure out what the start
> file is but the end file was a mystery until I actually created a
> backup history file and looked in it. The only place I can find in
> the Online Backup instructions where this is indicated is this
> paragraph:
>
> "To make use of this backup, you will need to keep around all the WAL
> segment files generated during and after the file system backup. To
> aid you in doing this, the pg_stop_backup function creates a backup
> history file that is immediately stored into the WAL archive area.
> This file is named after the first WAL segment file that you need to
> have to make use of the backup. For example, if the starting WAL file
> is 0000000100001234000055CD the backup history file will be named
> something like 0000000100001234000055CD.007C9330.backup. (The second
> number in the file name stands for an exact position within the WAL
> file, and can ordinarily be ignored.) Once you have safely archived
> the file system backup and the WAL segment files used during the
> backup (as specified in the backup history file), all archived WAL
> segments with names numerically less are no longer needed to recover
> the file system backup and may be deleted. However, you should
> consider keeping several backup sets to be absolutely certain that
> you can recover your data. Keep in mind that only completed WAL
> segment files are archived, so there will be delay between running
> pg_stop_backup and the archiving of all WAL segment files needed to
> make the file system backup consistent."
>
> Reading it now it seems obvious that the file would contain not only
> the start WAL file but also the Stop WAL file but when going over the
> directions the first time it did not pick up on it. And it left me
> thinking I would have to use some hack to figure it out if I ever
> wanted to test a base backup. It would have been less confusing to
> me if it just said right in the docs: "The backup history file
> contains both the start WAL file name and the Stop WAL file name" or
> something like that just to make it perfectly clear.
>
> Now that I know this I can extract that filename from the backup
> history file, check to see if it has been archived and copy it
> somewhere if it hasn't been archived yet. I'm pretty sure that I can
> assume that all files before the stop file have already been
> archived. So once I backup the stop file I can be positive that the
> base backup I just made will be valid when I try to restore from it.
>
> This lessens my need for the "get current WAL file" functionality in
> this context. It will still be nice to have in the context of
> backing it up every five minutes or so in case a WAL file takes a
> long time to fill up.
>
> Anyway I would have been less confused if the docs had made it more
> clear that the name of the stop wal file was in the backup history file.
>
> Rick
>
>
> On Jan 30, 2006, at 10:20 PM, Bruce Momjian wrote:
>
> >
> > Yes, I think copying it while it is being written is safe.
> >
> > ----------------------------------------------------------------------
> > -----
> >
> > Rick Gigger wrote:
> >> Yes! Thanks you! That is exactly what I was looking for.
> >>
> >> So I take it that this means that it is save to copy the current in
> >> use WAL file even as it is being written to?
> >> And it also means that if I copy it with my physical file system
> >> backup then I should have the last file that I need to restore from
> >> that physical backup?
> >>
> >> So if I write my own backup_latest_WAL_file.sh script (I think I
> >> found one on the list from Simon Riggs) then I can do what I need to
> >> do before those todo items get done? Or will I need to wait till
> >> postgres gives me the ability to safely copy the file?
> >>
> >>
> >>
> >> On Jan 30, 2006, at 11:13 AM, Bruce Momjian wrote:
> >>
> >>>
> >>> Unfortunately, I think I understand your question. :-)
> >>>
> >>> These TODO items are what you need:
> >>>
> >>> * Point-In-Time Recovery (PITR)
> >>>
> >>> o Allow point-in-time recovery to archive partially filled
> >>> write-ahead logs [pitr]
> >>>
> >>> Currently only full WAL files are archived. This means
> >>> that the
> >>> most recent transactions aren't available for recovery
> >>> in case
> >>> of a disk failure. This could be triggered by a user
> >>> command or
> >>> a timer.
> >>>
> >>> o Automatically force archiving of partially-filled WAL
> >>> files when
> >>> pg_stop_backup() is called or the server is stopped
> >>>
> >>> Doing this will allow administrators to know more
> >>> easily when
> >>> the archive contains all the files needed for point-in-
> >>> time
> >>> recovery.
> >>>
> >>> I will try to push to have them done for 8.2.
> >>>
> >>> --------------------------------------------------------------------
> >>> --
> >>> -----
> >>>
> >>> Rick Gigger wrote:
> >>>> I guess my email wasn't all that clear. I will try to rephrase. I
> >>>> am moving from using the old style pg_dump for backups to using
> >>>> incrementals and want to make sure I understand the process
> >>>> before I
> >>>> go about writing a bunch of scritps.
> >>>>
> >>>> To me setting up incremental backup consists of the following
> >>>> components:
> >>>>
> >>>> 1) Setting up the WAL archiving. This one is trivial.
> >>>> 2) Doing physical dumps of the $PGDATA directory. This one is once
> >>>> again trivial.
> >>>> 3) Knowing which physical dumps are Good and Not Good. For a given
> >>>> physical dump D there is are WAL archive files Dstart and Dend for
> >>>> which you much have Dstart and Dend and all files in between.
> >>>> If you
> >>>> have all those files then the physical dump is Good. If you don't
> >>>> have them then the dump is worthless to you.
> >>>> 4) Knowing which dumps and which archive files can be deleted.
> >>>> This
> >>>> depends on a number of factors.
> >>>> a) How far back do you want to be able to do PITR
> >>>> b) How much space do you have / want to use for PITR
> >>>> c) Which physical dumps are Good and which are Not Good. (see #3)
> >>>>
> >>>> Now I think I have a pretty good plan here except for #3 (and so #4
> >>>> then also suffers).
> >>>>
> >>>> Just as an example lets say I'm not concerned so much with PITR
> >>>> as I
> >>>> am recovering from a db crash. I've got all the backups files saved
> >>>> to my backup db server so I can failover to it if my primary db
> >>>> server dies. I just want to make sure I've got one physical dump
> >>>> that is good. (This is not my actual situation but it
> >>>> illustrated my
> >>>> point better.)
> >>>>
> >>>> Now when I do a physical dump it is not a Good dump. That is I
> >>>> don't
> >>>> have the end archive file necessary to recover from that physical
> >>>> dump. That is to say that when I call pg_backup_start() then copy
> >>>> $PGDATA then call pg_backup_stop() postgres might be on say WAL
> >>>> archive file #5. Once the physical dump is completed WAL archive
> >>>> file #5 hasn't been archived yet. I only have up to #4. So if I
> >>>> delete my old physical dumps and all I've got is this most
> >>>> recent one
> >>>> and my database crashes before #5 gets archived then I am hosed. I
> >>>> have no good physical backups to start from.
> >>>>
> >>>> My main question is about the best way to figure out when a
> >>>> physical
> >>>> dump is Good.
> >>>>
> >>>> One strategy is to always keep around lots of physical dumps.
> >>>> If you
> >>>> keep around 100 dumps you can be pretty sure that in the space of
> >>>> time that those physical dumps take place that at least one WAL
> >>>> file
> >>>> was archived. In fact if you keep 2 physical dumps you can be
> >>>> fairly
> >>>> certain of this. If not then you really need to space our your
> >>>> dumps
> >>>> more.
> >>>>
> >>>> Is this making sense at this point?
> >>>>
> >>>> The problem is that the WAL archiving is triggered by postgres and
> >>>> the rate at which the db is updated. The physical dumps are
> >>>> triggered by cron and on a purely time based schedule. So in
> >>>> theory
> >>>> if you had the physical dumps happening once a day but for some odd
> >>>> reason no one updated the database for 4 days then all of a sudden
> >>>> you'd have 2 physical backups and neither of them are good. If
> >>>> you're db crashes during that time you are hosed.
> >>>>
> >>>> Maybe I am arguing a point that is just stupid because this will
> >>>> never happen in real life. But in that it is my backups system
> >>>> that
> >>>> I will be using to recover from complete and total disaster I just
> >>>> want to have all my bases covered.
> >>>>
> >>>> So my ideas on how to determine if a physical dump is Good are as
> >>>> follows.
> >>>>
> >>>> 1) When you do the physical backup (after dumping the $PGDATA
> >>>> dir but
> >>>> before calling pg_stop_backup() ) determine the current WAL archive
> >>>> file. Mark somewhere in the backed up physical dump the last file
> >>>> needed for the dump to be considered good. Then your deletion
> >>>> scripts can look at the WAL archive files you have and the last one
> >>>> required for the dump to be Good and determine if the dump is
> >>>> Good or
> >>>> not.
> >>>>
> >>>> 2) After doing the physical dump but before calling
> >>>> pg_stop_backup()
> >>>> just copy the current WAL file to the physical dump. If that file
> >>>> later gets archived then the restore commands overwrites your
> >>>> partially completed one so it doesn't hurt but you know that
> >>>> when you
> >>>> call pg_stop_backup() that that physical dump is good. (Is it
> >>>> ok to
> >>>> copy the current WAL file while it is still in use?)
> >>>>
> >>>> Is anyone taking one of these or any other precautions to make sure
> >>>> they've got a good physical dump or does everyone just keep a whole
> >>>> bunch of dumps around, and then actually restore the dump to see if
> >>>> it is good and if not go back to a previous dump?
> >>>>
> >>>> I hope that makes more sense.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Rick
> >>>>
> >>>> On Jan 27, 2006, at 3:33 AM, Richard Huxton wrote:
> >>>>
> >>>>> Rick Gigger wrote:
> >>>>>> Um, no you didn't read my email at all. I am aware of all of
> >>>>>> that
> >>>>>> and it is clearly outlined in the docs. My email was about a
> >>>>>> specific detail in the process. Please read it if you want to
> >>>>>> know what my actual question was.
> >>>>>
> >>>>> I'm not sure your email is quite right as regards the process. You
> >>>>> need:
> >>>>> 1. the filesystem backup
> >>>>> 2. the WAL file indicated in the history-file
> >>>>> 3. all the WAL files later than that
> >>>>> to get up to "now".
> >>>>>
> >>>>> If you don't want to replay up to "now" then you will not need
> >>>>> some
> >>>>> of the more recent WAL files. You can't afford to throw them away
> >>>>> though since you've got a rolling backup system running and the
> >>>>> whole point is so you can recover to any point you like.
> >>>>>
> >>>>> You can however throw away any WAL files older than that indicated
> >>>>> in the history file for your current filesystem-backup. You can
> >>>>> then only restore from that point in time forward.
> >>>>>
> >>>>> There is no "last one" in the WAL set unless you know the time you
> >>>>> want to restore to. Indeed, the "last one" might not be "full" yet
> >>>>> and therefore archived if you want to restore to 10 seconds ago.
> >>>>>
> >>>>> Or am I mis-understanding your email too?
> >>>>>
> >>>>> --
> >>>>> Richard Huxton
> >>>>> Archonet Ltd
> >>>>>
> >>>>
> >>>>
> >>>> ---------------------------(end of
> >>>> broadcast)---------------------------
> >>>> TIP 4: Have you searched our list archives?
> >>>>
> >>>> http://archives.postgresql.org
> >>>>
> >>>
> >>> --
> >>> Bruce Momjian | http://candle.pha.pa.us
> >>> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> >>> + If your life is a hard drive, | 13 Roberts Road
> >>> + Christ can be your backup. | Newtown Square,
> >>> Pennsylvania 19073
> >>>
> >>
> >>
> >> ---------------------------(end of
> >> broadcast)---------------------------
> >> TIP 4: Have you searched our list archives?
> >>
> >> http://archives.postgresql.org
> >>
> >
> > --
> > Bruce Momjian | http://candle.pha.pa.us
> > pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> > + If your life is a hard drive, | 13 Roberts Road
> > + Christ can be your backup. | Newtown Square,
> > Pennsylvania 19073
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> >
> > http://archives.postgresql.org
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>
--
Bruce Momjian http://candle.pha.pa.us
SRA OSS, Inc. http://www.sraoss.com
+ If your life is a hard drive, Christ can be your backup. +
Attachment | Content-Type | Size |
---|---|---|
unknown_filename | text/plain | 1.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Marc G. Fournier | 2006-02-24 14:52:53 | Re: Oracle purchases Sleepycat - is this the "other shoe" |
Previous Message | Bruce Momjian | 2006-02-24 13:26:55 | Re: Does PG really lack a time zone for India? |