From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: block-level incremental backup |
Date: | 2019-09-16 19:38:47 |
Message-ID: | 20190916193847.GG6962@tamriel.snowman.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greetings,
* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Mon, Sep 16, 2019 at 1:10 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > I disagree with this on a couple of levels. The first is pretty simple-
> > we don't have all of the information. The user may have some reason to
> > believe that timestamp-based is a bad idea, for example, and therefore
> > having an option to perform a checksum-based backup makes sense. rsync
> > is a pretty good tool in my view and it has a very similar option-
> > because there are trade-offs to be made. LSN is great, if you don't
> > mind reading every file of your database start-to-finish every time, but
> > in a running system which hasn't suffered from clock skew or other odd
> > issues (some of which we can also detect), it's pretty painful to scan
> > absolutely everything like that for an incremental.
>
> There's a separate thread on using WAL-scanning to avoid having to
> scan all the data every time. I pointed it out to you early in this
> thread, too.
As discussed nearby, not everything that needs to be included in the
backup is actually going to be in the WAL though, right? How would that
ever be able to handle the case where someone starts the server under
wal_level = logical, takes a full backup, then restarts with wal_level =
minimal, writes out a bunch of new data, and then restarts back to
wal_level = logical and takes an incremental?
How would we even detect that such a thing happened?
> > If you track the checksum of the file in the manifest then it's a pretty
> > strong validation that the backup repo hasn't been corrupted between the
> > backup and the restore. Of course, the database could have been
> > corrupted at the source, and perhaps that's what you were getting at
> > with your 'limited extent' but that isn't what I was referring to.
>
> Yeah, that all seems fair. Without the checksum, you can only validate
> that you have the right files and that they are the right sizes, which
> is not bad, but the checksums certainly make it stronger. But,
> wouldn't having to checksum all of the files add significantly to the
> cost of taking the backup? If so, I can imagine that some people might
> want to pay that cost but others might not. If it's basically free to
> checksum the data while we have it in memory anyway, then I guess
> there's little to be lost.
On larger systems, so many of the files are 1GB in size that checking
the file size is quite close to meaningless. Yes, having to checksum
all of the files definitely adds to the cost of taking the backup, but
to avoid it we need strong assurances that a given file hasn't been
changed since our last full backup. WAL, today at least, isn't quite
that, and timestamps can possibly be fooled with, so if you'd like to be
particularly careful, there doesn't seem to be a lot of alternatives.
> > I'm pretty baffled by this argument, particularly in this context. We
> > already have tooling around trying to manage WAL archives in core- see
> > pg_archivecleanup. Further, we're talking about pg_basebackup here, not
> > about Netbackup or Tivoli, and the results of a pg_basebackup (that is,
> > a set of tar files, or a data directory) could happily be backed up
> > using whatever Enterprise tool folks want to use- in much the same way
> > that a pgbackrest repo is also able to be backed up using whatever
> > Enterprise tools someone wishes to use. We designed it quite carefully
> > to work with exactly that use-case, so the distinction here is quite
> > lost on me. Perhaps you could clarify what use-case these changes to
> > pg_basebackup solve, when working with a Netbackup or Tivoli system,
> > that pgbackrest doesn't, since you bring it up here?
>
> I'm not an expert on any of those systems, but I doubt that
> everybody's OK with backing everything up to a pgbackrest repository
> and then separately backing up that repository to some other system.
> That sounds like a pretty large storage cost.
I'm not asking you to be an expert on those systems, just to help me
understand the statements you're making. How is backing up to a
pgbackrest repo different than running a pg_basebackup in the context of
using some other Enterprise backup system? In both cases, you'll have a
full copy of the backup (presumably compressed) somewhere out on a disk
or filesystem which is then backed up by the Enterprise tool.
> > As for if we should be sending more to the server, or asking the server
> > to send more to us, I don't really have a good feel for what's "best".
> > At least one implementation I'm familiar with builds a manifest on the
> > PG server side and then compares the results of that to the manifest
> > stored with the backup (where that comparison is actually done is on
> > whatever system the "backup" was started from, typically a backup
> > server). Perhaps there's an argument for sending the manifest from the
> > backup repository to PostgreSQL for it to then compare against the data
> > directory but I'm not really sure how it could possibly do that more
> > efficiently and that's moving work to the PG server that it doesn't
> > really need to do.
>
> I agree with all that, but... if the server builds a manifest on the
> PG server that is to be compared with the backup's manifest, the one
> the PG server builds can't really include checksums, I think. To get
> the checksums, it would have to read the entire cluster while building
> the manifest, which sounds insane. Presumably it would have to build a
> checksum-free version of the manifest, and then the client could
> checksum the files as they're streamed down and write out a revised
> manifest that adds the checksums.
Unless files can be excluded based on some relatively strong criteria,
then yes, the approach would be to use checksums of the files and would
necessairly include all files, meaning that you'd have to read them all.
That's not great, of course, which is why there are trade-offs to be
made, one of which typically involves using timestamps, but doing so
quite carefully, to perform the file exclusion. Other ideas are great
but it seems like WAL isn't really a great idea unless we make some
changes there and we, as in PG, haven't got a robust "we know this file
changed as of this point" to work from. I worry that we're putting too
much faith into a system to do something independent of what it was
actually built and designed to do, and thinking that because we could
trust it for X, we can trust it for Y.
Thanks,
Stephen
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2019-09-16 20:57:19 | Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions |
Previous Message | Alexander Korotkov | 2019-09-16 19:30:38 | Re: Bug in GiST paring heap comparator |