Re: Basic question on recovery and disk snapshotting

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc: Jov <amutu(at)amutu(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Basic question on recovery and disk snapshotting
Date: 2013-05-01 23:56:06
Message-ID: CAMkU=1wLLj728cAAJ7UV40L0sVMGwkBuYUz0dTfKuWHt-5QWPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Saturday, April 27, 2013, Yang Zhang wrote:

> On Sat, Apr 27, 2013 at 11:55 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com<javascript:;>>
> wrote:
> > On Sat, Apr 27, 2013 at 10:40 AM, Yang Zhang <yanghatespam(at)gmail(dot)com<javascript:;>>
> wrote:
> >> My question really boils down to: if we're interested in using COW
> >> snapshotting (a common feature of modern filesystems and hosting
> >> environments), would we necessarily need to ensure the data and
> >> pg_xlog are on the same snapshotted volume?
> >
> >
> > That would certainly make it easier. But it shouldn't be necessary, as
> long
> > as the xlog snapshot is taken after the cluster snapshot, and also as
> long
> > as no xlog files which were written to after the last completed
> checkpoint
> > prior to the cluster snapshot got recycled before the xlog snapshot. As
> > long as the snapshots run quickly and promptly one after the other, this
> > should not be a problem, but you should certainly validate that a
> snapshot
> > collection has all the xlogs it needs before accepting it as being good.
> If
> > you find some necessary xlog files are missing, you can turn up
> > wal_keep_segments and try again.
>
> This information is gold, thank you.
>
> How do I validate that a snapshot collection has all the xlogs it needs?
>

I've always validated my backups by practicing restoring them. It seems
like the most rigorous way, and I figure I need the practice. If that
isn't feasible, the backup_label file created by pg_start_backup() will
tell you by name which xlog is the first one you need. If not, you can use
pg_controldata to figure that out based on "Latest checkpoint's REDO
location", keeping in mind that the part after the / is not zero padded on
the left, so if it "short" you have to take that into account. I have no
actual experience in doing this in practice.

> >
> >
> > Your goal seems to be to *avoid* continuous archiving, so I wouldn't
> expect
> > that part of the docs to touch on your issues. But see the section
> > "Standalone Hot Backups" which would allow you to use snapshots for the
> > cluster "copy" part, and normal archiving for just the xlogs. The
> volume of
> > pg_xlog should be fairly small, so this seems to me like an attractive
> > option.
>
> Just to validate my understanding, are the two options as follows?
>
> a. Checkpoint (optional but helps with time window?), snapshot
> tablespaces/cluster/xlog, validate all necessary xlogs present.
>

Yes. And doing the checkpoint immediately before has two good effects, it
makes the recovery faster, and it maximizes the time until wal logs you
need will start to be recycled.

>
> b. Set wal_level/archive_mode/archive_command, pg_start_backup,
> snapshot tablespaces/cluster, pg_stop_backup to archive xlog.
>
> (a) sounds more appealing since it's treating recovery as crash
> recovery rather than backup restore, and as such seems simpler and
> lower-overhead (e.g. WAL verbosity, though I don't know how much that
> overhead is).

That brings up another point to consider. If wal level is minimal, then
tables which you bulk load in the same transaction as you created them or
truncated them will not get any WAL records written. (That is the main
reason the WAL verbosity is reduced). But that also means that if any of
those operations is happening while you are taking your snapshot, those
operations will be corrupted. If the data and xlogs were part of the same
atomic snapshot, this would not be a problem, as either the operation
completed, or it never happened. But with different snapshots, the data
can get partially but not completely into the data-snapshot, but then the
xlog record which says the data was completely written does gets into the
xlog snapshot

If the system is very busy with different people doing different operations
at the same time, it could be hard to coordinate a time to take the backup.

> However, I'm not sure how complex that validation step
> is.
>
>
The more I think about it, the more I doubt it is worth it. It would be
very easy to come up with a method that sort of works, some of the time,
and then blows up spectacularly for no apparent reason. Using the
archiving, at least over the period during which the backup is being taken,
is the truly supported way of doing it. You can still use a snapshot of
the data directory, but you are really just treating it as a fast copy
operation. That it is atomic is not important to the process.

> >
> > If you really don't want to use archiving, even just during the duration
> of
> > the cluster snapshotting, then this is the part that addresses your
> > questions:
> >
> > http://www.postgresql.org/docs/9.1/static/backup-file.html
>
> I'm still interested in online backups, though - stopping the DB is a
> no-go unfortunately.
>

I wonder if initiating a EBS snapshot of large volume isn't going to freeze
the database up for a while anyway, just by starving it of IO. It would be
interesting to hear back on your experiences.

Cheers,

Jeff

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2013-05-02 00:03:25 Re: MKDIR_P@: Command not found error in regression test
Previous Message Karthik GP 2013-05-01 23:34:46 MKDIR_P@: Command not found error in regression test