Re: block-level incremental backup

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: block-level incremental backup
Date: 2019-09-16 17:10:50
Message-ID: 20190916171050.GD6962@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Mon, Sep 16, 2019 at 10:38 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > In a number of cases, trying to make sure that on a failover or copy of
> > the backup the next 'incremental' is really an 'incremental' is
> > dangerous. A better strategy to address this, and the other issues
> > realized on this thread recently, is to:
> >
> > - Have a manifest of every file in each backup
> > - Always back up new files that weren't in the prior backup
> > - Keep a checksum of each file
> > - Track the timestamp of each file as of when it was backed up
> > - Track the file size of each file
> > - Track the starting timestamp of each backup
> > - Always include files with a modification time after the starting
> > timestamp of the prior backup, or if the file size has changed
> > - In the event of any anomolies (which includes things like a timeline
> > switch), use checksum matching (aka 'delta checksum backup') to
> > perform the backup instead of using timestamps (or just always do that
> > if you want to be particularly careful- having an option for it is
> > great)
> > - Probably other things I'm not thinking of off-hand, but this is at
> > least a good start. Make sure to checksum this information too.
>
> I agree with some of these ideas but not all of them. I think having
> a backup manifest is a good idea; that would allow taking a new
> incremental backup to work from the manifest rather than the data
> directory, which could be extremely useful, because it might be a lot
> faster and the manifest could also be copied to a machine other than
> the one where the entire backup is stored. If the backup itself has
> been pushed off to S3 or whatever, you can't access it quickly, but
> you could keep the manifest around.

Yes, those are also good reasons for having a manifest.

> I also agree that backing up all files that weren't in the previous
> backup is a good strategy. I proposed that fairly explicitly a few
> emails back; but also, the contrary is obviously nonsense. And I also
> agree with, and proposed, that we record the size along with the file.

Sure, I didn't mean to imply that there was something wrong with that.
Including the checksum and other metadata is also valuable, both for
helping to identify corruption in the backup archive and for forensics,
if not for other reasons.

> I don't really agree with your comments about checksums and
> timestamps. I think that, if possible, there should be ONE method of
> determining whether a block has changed in some important way, and I
> think if we can make LSN work, that would be for the best. If you use
> multiple methods of detecting changes without any clearly-defined
> reason for so doing, maybe what you're saying is that you don't really
> believe that any of the methods are reliable but if we throw the
> kitchen sink at the problem it should come out OK. Any bugs in one
> mechanism are likely to be masked by one of the others, but that's not
> as as good as one method that is known to be altogether reliable.

I disagree with this on a couple of levels. The first is pretty simple-
we don't have all of the information. The user may have some reason to
believe that timestamp-based is a bad idea, for example, and therefore
having an option to perform a checksum-based backup makes sense. rsync
is a pretty good tool in my view and it has a very similar option-
because there are trade-offs to be made. LSN is great, if you don't
mind reading every file of your database start-to-finish every time, but
in a running system which hasn't suffered from clock skew or other odd
issues (some of which we can also detect), it's pretty painful to scan
absolutely everything like that for an incremental.

Perhaps the discussion has already moved on to having some way of our
own to track if a given file has changed without having to scan all of
it- if so, that's a discussion I'd be interested in. I'm not against
other approaches here besides timestamps if there's a solid reason why
they're better and they're also able to avoid scanning the entire
database.

> > By having a manifest for each backed up file for each backup, you also
> > gain the ability to validate that a backup in the repository hasn't been
> > corrupted post-backup, a feature that at least some other database
> > backup and restore systems have (referring specifically to the big O in
> > this particular case, but I bet others do too).
>
> Agreed. The manifest only lets you validate to a limited extent, but
> that's still useful.

If you track the checksum of the file in the manifest then it's a pretty
strong validation that the backup repo hasn't been corrupted between the
backup and the restore. Of course, the database could have been
corrupted at the source, and perhaps that's what you were getting at
with your 'limited extent' but that isn't what I was referring to.

Claiming that the backup has been 'validated' by only looking at file
sizes certainly wouldn't be acceptable. I can't imagine you were
suggesting that as you're certainly capable of realizing that, but I got
the feeling you weren't agreeing that having the checksum of the file
made sense to include in the manifest, so I feel like I'm missing
something here.

> > Having a system of keeping track of which backups are full and which are
> > differential in an overall system also gives you the ability to do
> > things like expiration in a sensible way, including handling WAL
> > expiration.
>
> True, but I'm not sure that functionality belongs in core. It
> certainly needs to be possible for out-of-core code to do this part of
> the work if desired, because people want to integrate with enterprise
> backup systems, and we can't come in and say, well, you back up
> everything else using Netbackup or Tivoli, but for PostgreSQL you have
> to use pg_backrest. I mean, maybe you can win that argument, but I
> know I can't.

I'm pretty baffled by this argument, particularly in this context. We
already have tooling around trying to manage WAL archives in core- see
pg_archivecleanup. Further, we're talking about pg_basebackup here, not
about Netbackup or Tivoli, and the results of a pg_basebackup (that is,
a set of tar files, or a data directory) could happily be backed up
using whatever Enterprise tool folks want to use- in much the same way
that a pgbackrest repo is also able to be backed up using whatever
Enterprise tools someone wishes to use. We designed it quite carefully
to work with exactly that use-case, so the distinction here is quite
lost on me. Perhaps you could clarify what use-case these changes to
pg_basebackup solve, when working with a Netbackup or Tivoli system,
that pgbackrest doesn't, since you bring it up here?

> > I'd like to clarify that while I would like to have an easier way to
> > parallelize backups, that's a relatively minor complaint- the much
> > bigger issue that I have with this feature is that trying to address
> > everything correctly while having only the amount of information that
> > could be passed on the command-line about the prior full/incremental is
> > going to be extremely difficult, complicated, and likely to lead to
> > subtle bugs in the actual code, and probably less than subtle bugs in
> > how users end up using it, since they'll have to implement the
> > expiration and tracking of information between backups themselves
> > (unless something's changed in that part during this discussion- I admit
> > that I've not read every email in this thread).
>
> Well, the evidence seems to show that you are right, at least to some
> extent. I consider it a positive good if the client needs to give the
> server only a limited amount of information. After all, you could
> always take an incremental backup by shipping every byte of the
> previous backup to the server, having it compare everything to the
> current contents, and having it then send you back the stuff that is
> new or different. But that would be dumb, because most of the point of
> an incremental backup is to save on sending lots of data over the
> network unnecessarily. Now, it seems that I took that goal to an
> unhealthy extreme, because as we've now realized, sending only an LSN
> and nothing else isn't enough to get a correct backup. So we need to
> send more, and it doesn't have to be the absolutely most
> stripped-down, bear-bones version of what could be sent. But it should
> be fairly minimal, I think; that's kinda the point of the feature.

Right- much of the point of an incremental backup feature is to try and
minimize the amount of work that's done while still getting a good
backup. I don't agree that we should focus solely on network bandwidth
as there are also trade-offs to be made around disk bandwidth to
consider, see above discussion regarding timestamps vs. checksum'ing
every file.

As for if we should be sending more to the server, or asking the server
to send more to us, I don't really have a good feel for what's "best".
At least one implementation I'm familiar with builds a manifest on the
PG server side and then compares the results of that to the manifest
stored with the backup (where that comparison is actually done is on
whatever system the "backup" was started from, typically a backup
server). Perhaps there's an argument for sending the manifest from the
backup repository to PostgreSQL for it to then compare against the data
directory but I'm not really sure how it could possibly do that more
efficiently and that's moving work to the PG server that it doesn't
really need to do.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-09-16 17:17:27 Re: refactoring - share str2*int64 functions
Previous Message Andres Freund 2019-09-16 17:08:19 Re: refactoring - share str2*int64 functions