From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: block-level incremental backup |
Date: | 2019-04-17 22:43:10 |
Message-ID: | 20190417224309.GH6197@tamriel.snowman.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greetings,
* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Mon, Apr 15, 2019 at 9:01 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > I love the general idea of having additional facilities in core to
> > support block-level incremental backups. I've long been unhappy that
> > any such approach ends up being limited to a subset of the files which
> > need to be included in the backup, meaning the rest of the files have to
> > be backed up in their entirety. I don't think we have to solve for that
> > as part of this, but I'd like to see a discussion for how to deal with
> > the other files which are being backed up to avoid needing to just
> > wholesale copy them.
>
> Ideas? Generally, I don't think that anything other than the main
> forks of relations are worth worrying about, because the files are too
> small to really matter. Even if they're big, the main forks of
> relations will be much bigger. I think.
Sadly, I haven't got any great ideas today. I do know that the WAL-G
folks have specifically mentioned issues with the visibility map being
large enough across enough of their systems that it kinda sucks to deal
with. Perhaps we could do something like the rsync binary-diff protocol
for non-relation files? This is clearly just hand-waving but maybe
there's something reasonable in that idea.
> > I'm quite concerned that trying to graft this on to pg_basebackup
> > (which, as you note later, is missing an awful lot of what users expect
> > from a real backup solution already- retention handling, parallel
> > capabilities, WAL archive management, and many more... but also is just
> > not nearly as developed a tool as the external solutions) is going to
> > make things unnecessairly difficult when what we really want here is
> > better support from core for block-level incremental backup for the
> > existing external tools to leverage.
> >
> > Perhaps there's something here which can be done with pg_basebackup to
> > have it work with the block-level approach, but I certainly don't see
> > it as a natural next step for it and really does seem like limiting the
> > way this is implemented to something that pg_basebackup can easily
> > digest might make it less useful for the more developed tools.
>
> I agree that there are a bunch of things that pg_basebackup does not
> do, such as backup management. I think a lot of users do not want
> PostgreSQL to do backup management for them. They have an existing
> solution that they use to manage backups, and they want PostgreSQL to
> interoperate with it. I think it makes sense for pg_basebackup to be
> in charge of taking the backup, and then other tools can either use it
> as a building block or use the streaming replication protocol to send
> approximately the same commands to the server.
There's something like 6 different backup tools, at least, for
PostgreSQL that provide backup management, so I have a really hard time
agreeing with this idea that users don't want a PG backup management
system. Maybe that's not what you're suggesting here, but that's what
came across to me.
Yes, there are some users who have an existing backup solution and
they'd like a better way to integrate PostgreSQL into that solution,
but that's usually something like filesystem snapshots or an enterprise
backup tool which has a PostgreSQL agent or similar to do the start/stop
and collect up the WAL, not something that's just calling pg_basebackup.
Those are typically not things we have any visibility into though and
aren't open source either (and, at least as often as not, they don't
seem to be very well thought through, based on my experience with those
tools...).
Unless maybe I'm misunderstanding and what you're suggesting here is
that the "existing solution" is something like the external PG-specific
backup tools? But then the rest doesn't seem to make sense, as only
maybe one or two of those tools use pg_basebackup internally.
> I certainly would not
> want to expose server capabilities that let you take an incremental
> backup and NOT teach pg_basebackup to use them -- then we'd be in a
> situation of saying that PostgreSQL has incremental backup, but you
> have to get external tool XYZ to use it. That will be perceived as
> PostgreSQL does NOT have incremental backup and this external tool
> adds it.
... but this is exactly the situation we're in already with all of the
*other* features around backup (parallel backup, backup management, WAL
management, etc). Users want those features, pg_basebackup/PG core
doesn't provide it, and therefore there's a bunch of other tools which
have been written that do. In addition, saying that PG has incremental
backup but no built-in management of those full-vs-incremental backups
and telling users that they basically have to build that themselves
really feels a lot like we're trying to address a check-box requirement
rather than making something that our users are going to be happy with.
> > As an example, I believe all of the other tools mentioned (at least,
> > those that are open source I'm pretty sure all do) support parallel
> > backup and therefore having a way to get the block-level changes in a
> > parallel fashion would be a pretty big thing that those tools will want
> > and pg_basebackup is single-threaded today and this proposal doesn't
> > seem to be contemplating changing that, implying that a serial-based
> > block-level protocol would be fine but that'd be a pretty awful
> > restriction for the other tools.
>
> I mentioned this exact issue in my original email. I spoke positively
> of it. But I think it is different from what is being proposed here.
> We could have parallel backup without incremental backup, and that
> would be a good feature. We could have parallel backup without full
> backup, and that would also be a good feature. We could also have
> both, which would be best of all. I don't see that my proposal throws
> up any architectural obstacle to parallelism. I assume parallel
> backup, whether full or incremental, would be implemented by dividing
> up the files that need to be sent across the available connections; if
> incremental backup exists, each connection then has to decide whether
> to send the whole file or only part of it.
I don't think that I was very clear in what my specific concern here
was. I'm not asking for pg_basebackup to have parallel backup (at
least, not in this part of the discussion), I'm asking for the
incremental block-based protocol that's going to be built-in to core to
be able to be used in a parallel fashion.
The existing protocol that pg_basebackup uses is basically, connect to
the server and then say "please give me a tarball of the data directory"
and that is then streamed on that connection, making that protocol
impossible to use for parallel backup. That's fine as far as it goes
because only pg_basebackup actually uses that protocol (note that nearly
all of the other tools for doing backups of PostgreSQL don't...). If
we're expecting the external tools to use the block-level incremental
protocol then that protocol really needs to have a way to be
parallelized, otherwise we're just going to end up with all of the
individual tools doing their own thing for block-level incremental
(though perhaps they'd reimplement whatever is done in core but in a way
that they could parallelize it...), if possible (which I add just in
case there's some idea that we end up in a situation where the
block-level incremental backup has to coordinate with the backend in
some fashion to work... which would mean that *everyone* has to use the
protocol even if it isn't parallel and that would be really bad, imv).
> > This part of the discussion is a another example of how we're limiting
> > ourselves in this implementation to the "pg_basebackup can work with
> > this" case- by only consideration the options of "scan all the files" or
> > "use the WAL- if the request is for WAL we have available on the
> > server." The other backup solutions mentioned in your initial email,
> > and others that weren't, have a WAL archive which includes a lot more
> > WAL than just what the primary currently has. When I've thought about
> > how WAL could be used to build a differential or incremental backup, the
> > question of "do we have all the WAL we need" hasn't ever been a
> > consideration- because the backup tool manages the WAL archive and has
> > WAL going back across, most likely, weeks or even months. Having a tool
> > which can essentially "compress" WAL would be fantastic and would be
> > able to be leveraged by all of the different backup solutions.
>
> I don't think this is a case of limiting ourselves; I think it's a
> case of keeping separate considerations properly separate. As I said
> in my original email, the client doesn't really need to know how the
> server is identifying the blocks that have been modified. That is the
> server's job. I started a separate thread on the WAL-scanning
> approach, so we should take that part of the discussion over there. I
> see no reason why the server couldn't be taught to reach back into an
> available archive for WAL that it no longer has locally, but that's
> really independent of the design ideas being discussed on this thread.
I've provided thoughts on that other thread, I'm happy to discuss
further there.
> > Two things here- having some file that "stops the server from starting"
> > is just going to cause a lot of pain, in my experience. Users do a lot
> > of really rather.... curious things, and then come asking questions
> > about them, and removing the file that stopped the server from starting
> > is going to quickly become one of those questions on stack overflow that
> > people just follow the highest-ranked question for, even though everyone
> > who follows this list will know that doing so results in corruption of
> > the database.
>
> Wait, you want to make it maximally easy for users to start the server
> in a state that is 100% certain to result in a corrupted and unusable
> database? Why?? I'd l like to make that a tiny bit difficult. If
> they really want a corrupted database, they can remove the file.
No, I don't want it to be easy for users to start the server in a state
that's going to result in a corrupted cluster. That's basically the
complete opposite of what I was going for- having a file that can be
trivially removed to start up the cluster is *going* to result in people
having corrupted clusters, no matter how much we tell them "don't do
that". This is exactly the problem with have with backup_label today.
I'd really rather not double-down on that.
> > An alternative approach in developing this feature would be to have
> > pg_basebackup have an option to run against an *existing* backup, with
> > the entire point being that the existing backup is updated with these
> > incremental changes, instead of having some independent tool which takes
> > the result of multiple pg_basebackup runs and then combines them.
>
> That would be really unsafe, because if the tool is interrupted before
> it finishes (and fsyncs everything), you no longer have any usable
> backup. It also doesn't lend itself to several of the scenarios I
> described in my original email -- like endless incrementals that are
> merged into the full backup after some number of days -- a capability
> upon which others have already remarked positively.
There's really two things here- the first is that I agree with the
concern about potentially destorying the existing backup if the
pg_basebackup doesn't complete, but there's some ways to address that
(such as filesystem snapshotting), so I'm not sure that the idea is
quite that bad, but it would need to be more than just what
pg_basebackup does in this case in order to be trustworthy (at least,
for most).
The other part here is the idea of endless incrementals where the blocks
which don't appear to have changed are never re-validated against what's
in the backup. Unfortunately, latent corruption happens and you really
want to have a way to check for that. In past discussions that I've had
with David, there's been some idea to check some percentage of the
blocks that didn't appear to change for each backup against what's in
the backup.
I share this just to point out that there's some risk to that approach,
not to say that we shouldn't do it or that we should discourage the
development of such a feature.
> > An alternative tool might be one which simply reads the WAL and keeps
> > track of the FPIs and the updates and then eliminates any duplication
> > which exists in the set of WAL provided (that is, multiple FPIs for the
> > same page would be merged into one, and only the delta changes to that
> > page are preserved, across the entire set of WAL being combined). Of
> > course, that's complicated by having to deal with the other files in the
> > database, so it wouldn't really work on its own.
>
> You've jumped back to solving the server's problem (which blocks
> should I send?) rather than the client's problem (what does an
> incremental backup look like once I've taken it and how do I manage
> and restore them?). It does seem possible to figure out the contents
> of modified blocks strictly from looking at the WAL, without any
> examination of the current database contents. However, it also seems
> very complicated, because the tool that is figuring out the current
> block contents just by looking at the WAL would have to know how to
> apply any type of WAL record, not just one that contains an FPI. And
> I really don't want to build a client-side tool that knows how to
> apply WAL.
Wow. I have to admit that I feel completely opposite of that- I'd
*love* to have an independent tool (which ideally uses the same code
through the common library, or similar) that can be run to apply WAL.
In other words, I don't agree that it's the server's problem at all to
solve that, or, at least, I don't believe that it needs to be.
> > I'd really prefer that we avoid adding in another low-level tool like
> > the one described here. Users, imv anyway, don't want to deal with
> > *more* tools for handling this aspect of backup/recovery. If we had a
> > tool in core today which managed multiples backups, kept track of them,
> > and all of the WAL during and between them, then we could add options to
> > that tool to do what's being described here in a way that makes sense
> > and provides a good interface to users. I don't know that we're going
> > to be able to do that with pg_basebackup when, really, the goal here
> > isn't actually to make pg_basebackup into an enterprise backup tool,
> > it's to make things easier for the external tools to do block-level
> > backups.
>
> Well, I agree with you that the goal is not to make pg_basebackup an
> enterprise backup tool. However, I don't see teaching it to take
> incremental backups as opposed to that goal. I think backup
> management and retention should remain firmly outside the purview of
> pg_basebackup and left either to some other in-core tool or maybe even
> to out-of-core tools. However, I don't see any reason why that the
> task of taking an incremental and/or parallel backup should also be
> left to another tool.
I've tried to outline how the incremental backup capability and backup
management are really very closely related and having those be
implemented by independent tools is not a good interface for our users
to have to live with.
> There is a very close relationship between the thing that
> pg_basebackup already does (copy everything) and the thing that we
> want to do here (copy everything except blocks that we know haven't
> changed). If we made it the job of some other tool to take parallel
> and/or incremental backups, that other tool would need to reimplement
> a lot of things that pg_basebackup has already got, like tar vs. plain
> format, fast vs. spread checkpoint, rate-limiting, compression levels,
> etc. That seems like a waste. Better to give pg_basebackup the
> capability to do those things, and then any backup management tool
> that anyone writes can take advantage of those capabilities.
I don't believe any of the external tools which do backups of PostgreSQL
support tar format. Fast-vs-spread checkpointing isn't in the purview
of the external tools, they just have to accept the option and pass it
to pg_start_backup(), which they already know how to do. Rate-limiting
and compression are implemented by those other tools already, where it's
been desired.
Most of the external tools don't use pg_basebackup, nor the base backup
protocol (or, if they do, it's only as an option among others). In my
opinion, that's pretty clear indication that pg_basebackup and the base
backup protocol aren't sufficient to cover any but the simplest of
use-cases (though those simple use-cases are handled rather well).
We're talking about adding on a capability that's much more complicated
and is one that a lot of tools have already taken a stab at, let's try
to do it in a way that those tools can leverage it and avoid having to
implement it themselves.
> I come at this, BTW, from the perspective of having just spent a bunch
> of time working on EDB's Backup And Recovery Tool (BART). That tool
> works in exactly the manner you seem to be advocating: it knows how to
> do incremental and parallel full backups, and it also does backup
> management. However, this has not turned out to be the best division
> of labor. People who don't want to use the backup management
> capabilities may still want the parallel or incremental backup
> capabilities, and if all of that is within the envelope of an
> "enterprise backup tool," they don't have that option. So I want to
> split it up. I want pg_basebackup to take all the kinds of backups
> that PostgreSQL supports -- full, incremental, parallel, serial,
> whatever -- and I want some other tool -- pgBackRest, BART, barman, or
> some yet-to-be-invented core thing to do the management of those
> backups. Then everybody can use exactly the bits they want.
I come at this from years of working with David on pgBackRest, listening
to what users want, what features they like, what they'd like to see
added, and what they don't like about how it works today.
It's an interesting idea to add in everything to pg_basebackup that
users doing backups would like to see, but that's quite a list:
- full backups
- differential backups
- incremental backups / block-level backups
- (server-side) compression
- (server-side) encryption
- page-level checksum validation
- calculating checksums (on the whole file)
- External object storage (S3, et al)
- more things...
I'm really not convinced that I agree with the division of labor as
you've outlined it, where all of the above is done by pg_basebackup,
where just archiving and backup retention are handled by some external
tool (except that we already have pg_receivewal, so archiving isn't
really an externally handled thing either, unless you want features like
parallel archive-push or parallel archive-get...).
What would really help me, at least, understand the idea here would be
to understand exactly what the existing tools do that the subset of
users you're thinking about doesn't like/want, but which pg_basebackup,
today, does. Is the issue that there's a repository instead of just a
plain PG directory or set of tar files, like what pg_basebackup produces
today? But how would we do things like have compression, or encryption,
or block-based incremental backups without some kind of repository or
directory that doesn't actually look exactly like a PG data directory?
Another thing I really don't understand from this discussion, and part of
why it's taken me a while to respond, is this, from above:
> I think a lot of users do not want
> PostgreSQL to do backup management for them.
Followed by:
> I come at this, BTW, from the perspective of having just spent a bunch
> of time working on EDB's Backup And Recovery Tool (BART). That tool
> works in exactly the manner you seem to be advocating: it knows how to
> do incremental and parallel full backups, and it also does backup
> management.
I certainly can understand that there are PostgreSQL users who want to
leverage incremental backups without having to use BART or another tool
outside of whatever enterprise backup system they've got, but surely
that's a large pool of users who *do* want a PG backup tool that manages
backups, or you wouldn't have spent a considerable amount of your very
valuable time hacking on BART. I've certainly seen a fair share of both
and I don't think we should set out to exclude either.
Perhaps that's what we're both saying too and just talking past each
other, but I feel like the approach here is "make it work just for the
simple pg_basebackup case and not worry too much about the other tools,
since what we do for pg_basebackup will work for them too" while where
I'm coming from is "focus on what the other tools need first, and then
make pg_basebackup work with that if there's a sensible way to do so."
A third possibility is that it's just too early to be talking about this
since it means we've gotta be awful vaugue about it.
Thanks!
Stephen
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2019-04-17 22:53:30 | Re: ToDo: show size of partitioned table |
Previous Message | Alvaro Herrera | 2019-04-17 22:06:00 | Re: pg_dump is broken for partition tablespaces |