From: | Rick Gigger <rick(at)alpinenetworking(dot)com> |
---|---|
To: | Rick Gigger <rick(at)alpinenetworking(dot)com> |
Cc: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Richard Huxton <dev(at)archonet(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, pgsql general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: incremental backups |
Date: | 2006-01-30 23:43:30 |
Message-ID: | 19532E3F-5BB4-479B-8D58-8A2947402E14@alpinenetworking.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
And here is the real million dollar question. Let's say for some
reason I don't have the last WAL file I need for my backup to be
valid. Will it die and tell me it's bad or will it just start up
with a screwed up data directory?
On Jan 30, 2006, at 4:29 PM, Rick Gigger wrote:
> Yes! Thanks you! That is exactly what I was looking for.
>
> So I take it that this means that it is save to copy the current in
> use WAL file even as it is being written to?
> And it also means that if I copy it with my physical file system
> backup then I should have the last file that I need to restore from
> that physical backup?
>
> So if I write my own backup_latest_WAL_file.sh script (I think I
> found one on the list from Simon Riggs) then I can do what I need
> to do before those todo items get done? Or will I need to wait
> till postgres gives me the ability to safely copy the file?
>
>
>
> On Jan 30, 2006, at 11:13 AM, Bruce Momjian wrote:
>
>>
>> Unfortunately, I think I understand your question. :-)
>>
>> These TODO items are what you need:
>>
>> * Point-In-Time Recovery (PITR)
>>
>> o Allow point-in-time recovery to archive partially filled
>> write-ahead logs [pitr]
>>
>> Currently only full WAL files are archived. This means
>> that the
>> most recent transactions aren't available for recovery
>> in case
>> of a disk failure. This could be triggered by a user
>> command or
>> a timer.
>>
>> o Automatically force archiving of partially-filled WAL
>> files when
>> pg_stop_backup() is called or the server is stopped
>>
>> Doing this will allow administrators to know more
>> easily when
>> the archive contains all the files needed for point-in-
>> time
>> recovery.
>>
>> I will try to push to have them done for 8.2.
>>
>> ---------------------------------------------------------------------
>> ------
>>
>> Rick Gigger wrote:
>>> I guess my email wasn't all that clear. I will try to rephrase. I
>>> am moving from using the old style pg_dump for backups to using
>>> incrementals and want to make sure I understand the process before I
>>> go about writing a bunch of scritps.
>>>
>>> To me setting up incremental backup consists of the following
>>> components:
>>>
>>> 1) Setting up the WAL archiving. This one is trivial.
>>> 2) Doing physical dumps of the $PGDATA directory. This one is once
>>> again trivial.
>>> 3) Knowing which physical dumps are Good and Not Good. For a given
>>> physical dump D there is are WAL archive files Dstart and Dend for
>>> which you much have Dstart and Dend and all files in between. If
>>> you
>>> have all those files then the physical dump is Good. If you don't
>>> have them then the dump is worthless to you.
>>> 4) Knowing which dumps and which archive files can be deleted. This
>>> depends on a number of factors.
>>> a) How far back do you want to be able to do PITR
>>> b) How much space do you have / want to use for PITR
>>> c) Which physical dumps are Good and which are Not Good. (see #3)
>>>
>>> Now I think I have a pretty good plan here except for #3 (and so #4
>>> then also suffers).
>>>
>>> Just as an example lets say I'm not concerned so much with PITR as I
>>> am recovering from a db crash. I've got all the backups files saved
>>> to my backup db server so I can failover to it if my primary db
>>> server dies. I just want to make sure I've got one physical dump
>>> that is good. (This is not my actual situation but it
>>> illustrated my
>>> point better.)
>>>
>>> Now when I do a physical dump it is not a Good dump. That is I
>>> don't
>>> have the end archive file necessary to recover from that physical
>>> dump. That is to say that when I call pg_backup_start() then copy
>>> $PGDATA then call pg_backup_stop() postgres might be on say WAL
>>> archive file #5. Once the physical dump is completed WAL archive
>>> file #5 hasn't been archived yet. I only have up to #4. So if I
>>> delete my old physical dumps and all I've got is this most recent
>>> one
>>> and my database crashes before #5 gets archived then I am hosed. I
>>> have no good physical backups to start from.
>>>
>>> My main question is about the best way to figure out when a physical
>>> dump is Good.
>>>
>>> One strategy is to always keep around lots of physical dumps. If
>>> you
>>> keep around 100 dumps you can be pretty sure that in the space of
>>> time that those physical dumps take place that at least one WAL file
>>> was archived. In fact if you keep 2 physical dumps you can be
>>> fairly
>>> certain of this. If not then you really need to space our your
>>> dumps
>>> more.
>>>
>>> Is this making sense at this point?
>>>
>>> The problem is that the WAL archiving is triggered by postgres and
>>> the rate at which the db is updated. The physical dumps are
>>> triggered by cron and on a purely time based schedule. So in theory
>>> if you had the physical dumps happening once a day but for some odd
>>> reason no one updated the database for 4 days then all of a sudden
>>> you'd have 2 physical backups and neither of them are good. If
>>> you're db crashes during that time you are hosed.
>>>
>>> Maybe I am arguing a point that is just stupid because this will
>>> never happen in real life. But in that it is my backups system that
>>> I will be using to recover from complete and total disaster I just
>>> want to have all my bases covered.
>>>
>>> So my ideas on how to determine if a physical dump is Good are as
>>> follows.
>>>
>>> 1) When you do the physical backup (after dumping the $PGDATA dir
>>> but
>>> before calling pg_stop_backup() ) determine the current WAL archive
>>> file. Mark somewhere in the backed up physical dump the last file
>>> needed for the dump to be considered good. Then your deletion
>>> scripts can look at the WAL archive files you have and the last one
>>> required for the dump to be Good and determine if the dump is
>>> Good or
>>> not.
>>>
>>> 2) After doing the physical dump but before calling pg_stop_backup()
>>> just copy the current WAL file to the physical dump. If that file
>>> later gets archived then the restore commands overwrites your
>>> partially completed one so it doesn't hurt but you know that when
>>> you
>>> call pg_stop_backup() that that physical dump is good. (Is it ok to
>>> copy the current WAL file while it is still in use?)
>>>
>>> Is anyone taking one of these or any other precautions to make sure
>>> they've got a good physical dump or does everyone just keep a whole
>>> bunch of dumps around, and then actually restore the dump to see if
>>> it is good and if not go back to a previous dump?
>>>
>>> I hope that makes more sense.
>>>
>>> Thanks,
>>>
>>> Rick
>>>
>>> On Jan 27, 2006, at 3:33 AM, Richard Huxton wrote:
>>>
>>>> Rick Gigger wrote:
>>>>> Um, no you didn't read my email at all. I am aware of all of that
>>>>> and it is clearly outlined in the docs. My email was about a
>>>>> specific detail in the process. Please read it if you want to
>>>>> know what my actual question was.
>>>>
>>>> I'm not sure your email is quite right as regards the process. You
>>>> need:
>>>> 1. the filesystem backup
>>>> 2. the WAL file indicated in the history-file
>>>> 3. all the WAL files later than that
>>>> to get up to "now".
>>>>
>>>> If you don't want to replay up to "now" then you will not need some
>>>> of the more recent WAL files. You can't afford to throw them away
>>>> though since you've got a rolling backup system running and the
>>>> whole point is so you can recover to any point you like.
>>>>
>>>> You can however throw away any WAL files older than that indicated
>>>> in the history file for your current filesystem-backup. You can
>>>> then only restore from that point in time forward.
>>>>
>>>> There is no "last one" in the WAL set unless you know the time you
>>>> want to restore to. Indeed, the "last one" might not be "full" yet
>>>> and therefore archived if you want to restore to 10 seconds ago.
>>>>
>>>> Or am I mis-understanding your email too?
>>>>
>>>> --
>>>> Richard Huxton
>>>> Archonet Ltd
>>>>
>>>
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 4: Have you searched our list archives?
>>>
>>> http://archives.postgresql.org
>>>
>>
>> --
>> Bruce Momjian | http://candle.pha.pa.us
>> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
>> + If your life is a hard drive, | 13 Roberts Road
>> + Christ can be your backup. | Newtown Square,
>> Pennsylvania 19073
>>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>
From | Date | Subject | |
---|---|---|---|
Next Message | Devrim GUNDUZ | 2006-01-31 00:48:45 | New project launched : PostgreSQL GUI Installer for Linux/Unix systems |
Previous Message | Rick Gigger | 2006-01-30 23:29:15 | Re: incremental backups |