Re: block-level incremental backup

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: block-level incremental backup
Date: 2019-04-24 16:57:36
Message-ID: 20190424165736.GU6197@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Wed, Apr 24, 2019 at 9:28 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > At least in part then it seems like we're viewing the level of effort
> > around what I'm talking about quite differently, and I feel like that's
> > largely because every time I mention parallel anything there's this
> > assumption that I'm asking you to parallelize pg_basebackup or write a
> > whole bunch more code to provide a fully optimized server-side parallel
> > implementation for backups. That really wasn't what I was going for. I
> > was thinking it would be a modest amount of additional work add
> > incremental backup via a few new commands, instead of through the
> > BASE_BACKUP protocol command, that would make parallelization possible.
>
> I'm not sure about that. It doesn't seem crazy difficult, but there
> are a few wrinkles. One is that if the client is requesting files one
> at a time, it's got to have a list of all the files that it needs to
> request, and that means that it has to ask the server to make a
> preparatory pass over the whole PGDATA directory to get a list of all
> the files that exist. That overhead is not otherwise needed. Another
> is that the list of files might be really large, and that means that
> the client would either use a lot of memory to hold that great big
> list, or need to deal with spilling the list to a spool file
> someplace, or else have a server protocol that lets the list be
> fetched in incrementally in chunks.

So, I had a thought about that when I was composing the last email and
while I'm still unsure about it, maybe it'd be useful to mention it
here- do we really need a list of every *file*, or could we reduce that
down to a list of relations + forks for the main data directory, and
then always include whatever other directories/files are appropriate?

When it comes to operating in chunks, well, if we're getting a list of
relations instead of files, we do have this thing called cursors..

> A third is that, as you mention
> further on, it means that the client has to care a lot more about
> exactly how the server is figuring out which blocks have been
> modified. If it just says BASE_BACKUP ..., the server an be
> internally reading each block and checking the LSN, or using
> WAL-scanning or ptrack or whatever and the client doesn't need to know
> or care. But if the client is asking for a list of modified files or
> blocks, then that presumes the information is available, and not too
> expensively, without actually reading the files.

I would think the client would be able to just ask for the list of
modified files, when it comes to building up the list of files to ask
for, which could potentially be done based on mtime instead of by WAL
scanning or by scanning the files themselves. Don't get me wrong, I'd
prefer that we work based on the WAL, since I have more confidence in
that, but certainly quite a few of the tools do work off mtime these
days and while it's not perfect, the risk/reward there is pretty
palatable to a lot of people.

> Fourth, MAX_RATE
> probably won't actually limit to the correct rate overall if the limit
> is applied separately to each file.

Sure, I hadn't been thinking about MAX_RATE and that would certainly
complicate things if we're offering to provide MAX_RATE-type
capabilities as part of this new set of commands.

> I'd be afraid that a patch that tried to handle all that as part of
> this project would get rejected on the grounds that it was trying to
> solve too many unrelated problems. Also, though not everybody has to
> agree on what constitutes a "modest amount of additional work," I
> would not describe solving all of those problems as a modest effort,
> but rather a pretty substantial one.

I suspect some of that's driven by how they get solved and if we decide
we have to solve all of them. With things like MAX_RATE + incremental
backups, I wonder how that's going to end up working, when you have the
option to apply the limit to the network, or to the disk I/O. You might
have addressed that elsewhere, I've not looked, and I'm not too
particular about it personally either, but a definition could be "max
rate at which we'll read the file you asked for on this connection" and
that would be pretty straight-forward, I'd think.

> > > Well, one thing you might want to do is have a tool that connects to
> > > the server, enters backup mode, requests information on what blocks
> > > have changed, copies those blocks via direct filesystem access, and
> > > then exits backup mode. Such a tool would really benefit from a
> > > START_BACKUP / SEND_FILE_LIST / SEND_FILE_CONTENTS / STOP_BACKUP
> > > command language, because it would just skip ever issuing the
> > > SEND_FILE_CONTENTS command in favor of doing that part of the work via
> > > other means. On the other hand, a START_PARALLEL_BACKUP LSN '1/234'
> > > command is useless to such a tool.
> >
> > That's true, but I hardly ever hear people talking about how wonderful
> > it is that pgBackRest uses SSH to grab the data. What I hear, often, is
> > that people would really like backups to be done over the PG protocol on
> > the same port that replication is done on. A possible compromise is
> > having a dedicated port for the backup agent to use, but it's definitely
> > not the preference.
>
> If you happen to be on the same system where the backup is running,
> reading straight from the data directory might be a lot faster.

Yes, that's certainly true.

> > The comments that Anastasia had around the issues with being able to
> > identify the full backup that goes with a given incremental backup, et
> > al, certainly echoed some my concerns regarding this part of the
> > discussion.
> >
> > As for the concerns about trying to avoid corruption from starting up an
> > invalid cluster, I didn't see much discussion about the idea of some
> > kind of cross-check between pg_control and backup_label. That was all
> > very hand-wavy, so I'm not too surprised, but I don't think it's
> > completely impossible to have something better than "well, if you just
> > remove this one file, then you get a non-obviously corrupt cluster that
> > you can happily start up". I'll certainly accept that it requires more
> > thought though and if we're willing to continue a discussion around
> > that, great.
>
> I think there are three different issues here that need to be
> considered separately.
>
> Issue #1: If you manually add files to your backup, remove files from
> your backup, or change files in your backup, bad things will happen.
> There is fundamentally nothing we can do to prevent this completely,
> but it may be possible to make the system more resilient against
> ham-handed modifications, at least to the extent of detecting them.
> That's maybe a topic for another thread, but it's an interesting one:
> Andres and I were brainstorming about it at some point.

I'd certainly be interested in hearing about ways we can improve on
that. I'm alright with it being on another thread as it's a broader
concern than just what we're talking about here.

> Issue #2: You can only restore an LSN-based incremental backup
> correctly if you have a base backup whose start-of-backup LSN is
> greater than or equal to the threshold LSN used to take the
> incremental backup. If #1 is not in play, this is just a simple
> cross-check at restoration time: retrieve the 'START WAL LOCATION'
> from the prior backup's backup_label file and the threshold LSN for
> the incremental backup from wherever you decide to store it and
> compare them; if they do not have the right relationship, ERROR. As
> to whether #1 might end up in play here, anything's possible, but
> wouldn't manually editing LSNs in backup metadata files be pretty
> obviously a bad idea? (Then again, I didn't really think the whole
> backup_label thing was that confusing either, and obviously I was
> wrong about that. Still, editing a file requires a little more work
> than removing it... you have to not only lie to the system, you have
> to decide which lie to tell!)

Yes, that'd certainly be at least one cross-check, but what if you've
got an incremental backup based on a prior incremental backup that's
based on a prior full, and you skip the incremental backup inbetween
somehow? Or are we just going to state outright that we don't support
incremental-on-incremental (in which case, all backups would actually be
either 'full' or 'differential' in the pgBackRest parlance, anyway, and
that parlance comes from my recollection of how other tools describe the
different backup types, but that was from many moons ago and might be
entirely wrong)?

> Issue #3: Even if you clearly understand the rule articulated in #2,
> you might find it hard to follow in practice. If you take a full
> backup on Sunday and an incremental against Sunday's backup or against
> the previous day's backup on each subsequent day, it's not really that
> hard to understand. But in more complex scenarios it could be hard to
> get right. For example if you've been removing your backups when they
> are a month old and and then you start doing the same thing once you
> add incrementals to the picture you might easily remove a full backup
> upon which a newer incremental depends. I see the need for good tools
> to manage this kind of complexity, but have no plan as part of this
> project to provide them. I think that just requires too many
> assumptions about where those backups are being stored and how they
> are being catalogued and managed; I don't believe I currently am
> knowledgeable enough to design something that would be good enough to
> meet core standards for inclusion, and I don't want to waste energy
> trying. If someone else wants to try, that's OK with me, but I think
> it's probably better to let this be a thing that people experiment
> with outside of core for a while until we see what ends up being a
> winner. I realize that this is a debatable position, but as I'm sure
> you realize by now, I have a strong desire to limit the scope of this
> project in such a way that I can get it done, 'cuz a bird in the hand
> is worth two in the bush.

Even if what we're talking about here is really only "differentials", or
backups where the incremental contains all the changes from a prior full
backup, if the only check is "full LSN is greater than or equal to the
incremental backup LSN", then you have a potential problem that's larger
than just the incrementals no longer being valid because you removed the
full backup on which they were taken- you might think that an *earlier*
full backup is the one for a given incremental and perform a restore
with the wrong full/incremental matchup and end up with a corrupted
database.

These are exactly the kind of issues that make me really wonder if this
is the right natural progression for pg_basebackup or any backup tool to
go in. Maybe there's some additional things we can do to make it harder
for someone to end up with a corrupted database when they restore, but
it's really hard to get things like expiration correct. We see users
already ending up with problems because they don't manage expiration of
their WAL correctly, and now we're adding another level of serious
complication to the expiration requirements that, as we've seen even on
this thread, some users are just not going to ever feel comfortable
with doing on their own.

Perhaps it's not relevant and I get that you want to build this cool
incremental backup capability into pg_basebackup and I'm not going to
stop you from doing it, but if I was going to build a backup tool,
adding support for block-level incremental backup wouldn't be where I'd
start, and, in fact, I might not even get to it even after investing
over 5 years in the project and even after building in proper backup
management. The idea of implementing block-level incrementals while
pushing the backup management, expiration, and dependency between
incrementals and fulls on to the user to figure out just strikes me as
entirely backwards and, frankly, to be gratuitously 'itch scratching' at
the expense of what users really want and need here.

One of the great things about pg_basebackup is its simplicity and
ability to be a one-time "give me a snapshot of the database" and this
is building in a complicated feature to it that *requires* users to
build their own basic capabilities externally in order to be able to use
it. I've tried to avoid getting into that here and I won't go on about
it, since it's your time to do with as you feel appropriate, but I do
worry that it makes us, as a project, look a bit more cavalier about
what users are asking for vs. what cool new thing we want to play with
than I, at least, would like us to be (so, I'll caveat that with "in
this area anyway", since I suspect saying this will probably come back
to bite me in some other discussion later ;).

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-04-24 17:02:03 Re: Regression test PANICs with master-standby setup on same machine
Previous Message Ashwin Agrawal 2019-04-24 16:42:57 Re: Regression test PANICs with master-standby setup on same machine