Re: block-level incremental backup

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: block-level incremental backup
Date: 2019-04-24 13:28:15
Message-ID: 20190424132815.GS6197@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Mon, Apr 22, 2019 at 2:26 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > There was basically zero discussion about what things would look like at
> > a protocol level (I went back and skimmed over the thread before sending
> > my last email to specifically see if I was going to get this response
> > back..). I get the idea behind the diff file, the contents of which I
> > wasn't getting into above.
>
> Well, I wrote:
>
> "There should be a way to tell pg_basebackup to request from the
> server only those blocks where LSN >= threshold_value."
>
> I guess I assumed that people would interested in the details take
> that to mean "and therefore the protocol would grow an option for this
> type of request in whatever way is the most straightforward possible
> extension of the current functionality is," which is indeed how you
> eventually interpreted it when you said we could "extend BASE_BACKUP
> is by adding LSN as an optional parameter."

Looking at it from what I'm sitting, I brought up two ways that we
could extend the protocol to "request from the server only those blocks
where LSN >= threshold_value" with one being the modification to
BASE_BACKUP and the other being a new set of commands that could be
parallelized. If I had assumed that you'd be thinking the same way I am
about extending the backup protocol, I wouldn't have said anything now
and then would have complained after you wrote a patch that just
extended the BASE_BACKUP command, at which point I likely would have
been told that it's now been done and that I should have mentioned it
earlier.

> > external tools to leverage that. It sounds like what you're suggesting
> > now is that you're happy to implement the backend code, expose it in a
> > way that works just for pg_basebackup, and that if someone else wants to
> > add things to the protocol to make it easier for external tools to
> > leverage, great.
>
> Yep, that's more or less it, although I am potentially willing to do
> some modest amount of that other work along the way. I just don't
> want to prioritize it higher than getting the actual thing I want to
> build built, which I think is a pretty fair position for me to take.

At least in part then it seems like we're viewing the level of effort
around what I'm talking about quite differently, and I feel like that's
largely because every time I mention parallel anything there's this
assumption that I'm asking you to parallelize pg_basebackup or write a
whole bunch more code to provide a fully optimized server-side parallel
implementation for backups. That really wasn't what I was going for. I
was thinking it would be a modest amount of additional work add
incremental backup via a few new commands, instead of through the
BASE_BACKUP protocol command, that would make parallelization possible.

Now, through this discussion, you've brought up some really good points
about how the initial thoughts I had around how we could add some
relatively simple commands, as part of this work, to make it easier for
someone to later add parallel support to pg_basebackup (either full or
incremental), or for external tools to leverage, might not be the best
solution when it comes to having parallel backup in core, and therefore
wouldn't actually end up being useful towards that end. That's
certainly a fair point and possibly enough to justify not spending even
the modest time I was thinking it'd need, but I'm not convinced. Now,
that said, if you are convinced that's the case, and you're doing the
work, then it's certainly your prerogative to go in the direction you're
convinced of. I don't mean any of this discussion to imply that I'd
object to a commit that extended BASE_BACKUP in the way outlined above,
but I understood the question to be "what do people think of this idea?"
and to that I'm still of the opinion that spending a modest amount of
time to provide a way to parallelize an incremental backup is worth it,
even if it isn't optimal and isn't the direct goal of this effort.

There's a tangent on all of this that's pretty key though, which is the
question around just how the blocks are identified. If the WAL scanning
is done to figure out the blocks, then that's quite a bit different from
the other idea of "open this relation and scan it, but only give me the
blocks after this LSN". It's the latter case that I've been mostly
thinking about in this thread, which is part of why I was thinking it'd
be a modest amount of work to have protocol commands that accepted a
file (or perhaps a relation..) to scan and return blocks from instead of
baking this into BASE_BACKUP which by definition just serially scans the
data directory and returns things as it finds them. For the case where
we have WAL scanning happening and modfiles which are being read and
used to figure out the blocks to send, it seems like it might be more
complicated and therefore potentially quite a bit more work to have a
parallel version of that.

> > All I can say is that that's basically how we ended up
> > in the situation we're in today where pg_basebackup doesn't support
> > parallel backup but a bunch of external tools do and they don't go
> > through the backend to get there, even though they'd probably prefer to.
>
> I certainly agree that core should try to do things in a way that is
> useful to external tools when that can be done without undue effort,
> but only if it can actually be done without undo effort. Let's see
> whether that's the case here:
>
> - Anastasia wants a command added that dumps out whatever the server
> knows about what files have changed, which I already agreed was a
> reasonable extension of my initial proposal.

That seems like a useful thing to have, I agree.

> - You said that for this to be useful to pgbackrest, it'd have to use
> a whole different mechanism that includes commands to request
> individual files and blocks within those files, which would be a
> significant rewrite of pg_basebackup that you agreed is more closely
> related to parallel backup than to the project under discussion on
> this thread. And that even then pgbackrest probably wouldn't use it
> because it also does server-side compression and encryption which are
> not included in this proposal.

Yes, having thought about it a bit more, without adding in the other
features that we already support in pgBackRest, it's unlikely we'd use
it in the form that I was contemplating. That said, it'd at least be
closer to something we could use and adding those other features, such
as compression and encryption, would almost certainly be simpler and
easier if there were already protocol commands like those we discussed
for parallel work.

> > Thanks for sharing your thoughts on that, certainly having the backend
> > able to be more intelligent about streaming files to avoid latency is
> > good and possibly the best approach. Another alternative to reducing
> > the latency would be to have a way for the client to request a set of
> > files, but I don't know that it'd be better.
>
> I don't know either. This is an area that needs more thought, I
> think, although as discussed, it's more related to parallel backup
> than $SUBJECT.

Yes, I agree with that.

> > I'm not really sure why the above is extremely inconvenient for
> > third-party tools, beyond just that they've already been written to work
> > with an assumption that the server-side of things isn't as intelligent
> > as PG is.
>
> Well, one thing you might want to do is have a tool that connects to
> the server, enters backup mode, requests information on what blocks
> have changed, copies those blocks via direct filesystem access, and
> then exits backup mode. Such a tool would really benefit from a
> START_BACKUP / SEND_FILE_LIST / SEND_FILE_CONTENTS / STOP_BACKUP
> command language, because it would just skip ever issuing the
> SEND_FILE_CONTENTS command in favor of doing that part of the work via
> other means. On the other hand, a START_PARALLEL_BACKUP LSN '1/234'
> command is useless to such a tool.

That's true, but I hardly ever hear people talking about how wonderful
it is that pgBackRest uses SSH to grab the data. What I hear, often, is
that people would really like backups to be done over the PG protocol on
the same port that replication is done on. A possible compromise is
having a dedicated port for the backup agent to use, but it's definitely
not the preference.

> Contrariwise, a tool that has its own magic - perhaps based on
> WAL-scanning or something like ptrack - to know which files currently
> exist and which blocks are modified could use SEND_FILE_CONTENTS but
> not SEND_FILE_LIST. And a filesystem-snapshot based technique might
> use START_BACKUP and STOP_BACKUP but nothing else.
>
> In short, providing granular commands like this lets the client be
> really intelligent even if the server isn't, and lets the client have
> fine-grained control of the process. This is very good if you're an
> out-of-core tool maintainer and your tool is trying to be smarter than
> - or even just differently-designed than - core.
>
> But if what you really want is just a maximally-efficient parallel
> backup, you don't need the commands to be fine-grained like this. You
> don't even really *want* the commands to be fine-grained like this,
> because it's better if the server works it all out so as to avoid
> unnecessary network round-trips. You just want to tell the server
> "hey, I want to do a parallel backup with 5 participants - hit me!"
> and have it do that in the most efficient way that it knows how,
> without forcing the client to make any decisions that can be made just
> as well, and perhaps more efficiently, on the server.
>
> On the third hand, one advantage of having the fine-grained commands
> is that it would not only make it easier for out-of-core tools to do
> cool things, but also in-core tools. For instance, you can imagine
> being able to do something like:
>
> pg_basebackup -D outputdir -d conninfo --copy-files-from=$PGDATA
>
> If the client is using what I'm calling fine-grained commands, this is
> easy to implement. If it's just calling a piece of server side
> functionality that sends back a tarball as a blob, it's not.
>
> So each approach has some pros and cons.

I agree that each has some pros and cons. Certainly one of the big
'cons' here is that it'd be a lot more backend work to implement the
'maximally-efficient parallel backup', while the fine-grained commands
wouldn't require nearly as much but would still allow a great deal of
the benefit for both in-core and out-of-core tools, potentially.

> > I'm disappointed that the concerns about the trouble that end users are
> > likely to have with this didn't garner more discussion.
>
> Well, we can keep discussing things. I've tried to reply to as many
> of your concerns as I can, but I believe you've written more email on
> this thread than everyone else combined, so perhaps I haven't entirely
> been able to keep up.
>
> That being said, as far as I can tell, those concerns were not
> seconded by anyone else. Also, if I understand correctly, when I
> asked how we could avoid that problem, you that you didn't know. And
> I said it seemed like we would need to a very expensive operation at
> server startup, or magic. So I feel that perhaps it is a problem that
> (1) is not of great general concern and (2) to which no really
> superior engineering solution is possible.

The comments that Anastasia had around the issues with being able to
identify the full backup that goes with a given incremental backup, et
al, certainly echoed some my concerns regarding this part of the
discussion.

As for the concerns about trying to avoid corruption from starting up an
invalid cluster, I didn't see much discussion about the idea of some
kind of cross-check between pg_control and backup_label. That was all
very hand-wavy, so I'm not too surprised, but I don't think it's
completely impossible to have something better than "well, if you just
remove this one file, then you get a non-obviously corrupt cluster that
you can happily start up". I'll certainly accept that it requires more
thought though and if we're willing to continue a discussion around
that, great.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2019-04-24 14:03:12 Re: pgsql: Allow insert and update tuple routing and COPY for foreign table
Previous Message alex lock 2019-04-24 13:26:41 Help to review the with X cursor option.