Re: where should I stick that backup?

From: David Steele <david(at)pgmasters(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: where should I stick that backup?
Date: 2020-04-12 21:57:05
Message-ID: 8d106ed1-10d7-f94f-8e4d-860865c55269@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/12/20 3:17 PM, Andres Freund wrote:
>
>> More generally, can you think of any ideas for how to structure an API
>> here that are easier to use than "write some C code"? Or do you think
>> we should tell people to write some C code if they want to
>> compress/encrypt/relocate their backup in some non-standard way?
>
>> For the record, I'm not against eventually having more than one way to
>> do this, maybe a shell-script interface for simpler things and some
>> kind of API for more complex needs (e.g. NetBackup integration,
>> perhaps). And I did wonder if there was some other way we could do
>> this.
>
> I'm doubtful that an API based on string replacement is the way to
> go. It's hard for me to see how that's not either going to substantially
> restrict the way the "tasks" are done, or yield a very complicated
> interface.
>
> I wonder whether the best approach here could be that pg_basebackup (and
> perhaps other tools) opens pipes to/from a subcommand and over the pipe
> it communicates with the subtask using a textual ([2]) description of
> tasks. Like:
>
> backup mode=files base_directory=/path/to/data/directory
> backup_file name=base/14037/16396.14 size=1073741824
> backup_file name=pg_wal/XXXX size=16777216
> or
> backup mode=tar
> base_directory /path/to/data/
> backup_tar name=dir.tar size=983498875687487

This is pretty much what pgBackRest does. We call them "local" processes
and they do most of the work during backup/restore/archive-get/archive-push.

> The obvious problem with that proposal is that we don't want to
> unnecessarily store the incoming data on the system pg_basebackup is
> running on, just for the subcommand to get access to them. More on that
> in a second.

We also implement "remote" processes so the local processes can get data
that doesn't happen to be local, i.e. on a remote PostgreSQL cluster.

> A huge advantage of a scheme like this would be that it wouldn't have to
> be specific to pg_basebackup. It could just as well work directly on the
> server, avoiding an unnecesary loop through the network. Which
> e.g. could integrate with filesystem snapshots etc. Without needing to
> build the 'archive target' once with server libraries, and once with
> client libraries.

Yes -- needing to store the data locally or stream it through one main
process is a major bottleneck.

Working on the server is key because it allows you to compress before
transferring the data. With parallel processing it is trivial to flood a
network. We have a recent example from a community user of backing up
25TB in 4 hours. Compression on the server makes this possible (and a
fast network, in this case).

For security reasons, it's also nice to be able to encrypt data before
it leaves the database server. Calculating checksums/size at the source
is also ideal.

> One reason I think something like this could be advantageous over a C
> API is that it's quite feasible to implement it from a number of
> different language, including shell if really desired, without needing
> to provide a C API via a FFI.

We migrated from Perl to C and kept our local/remote protocol the same,
which really helped. So, we had times when the C code was using a Perl
local/remote and vice versa. The idea is certainly workable in our
experience.

> It'd also make it quite natural to split out compression from
> pg_basebackup's main process, which IME currently makes it not really
> feasible to use pg_basebackup's compression.

This is a major advantage.

> There's various ways we could address the issue for how the subcommand
> can access the file data. The most flexible probably would be to rely on
> exchanging file descriptors between basebackup and the subprocess (these
> days all supported platforms have that, I think). Alternatively we
> could invoke the subcommand before really starting the backup, and ask
> how many files it'd like to receive in parallel, and restart the
> subcommand with that number of file descriptors open.

We don't exchange FDs. Each local is responsible for getting the data
from PostgreSQL or the repo based on knowing the data source and a path.
For pg_basebackup, however, I'd imagine each local would want a
replication connection with the ability to request specific files that
were passed to it by the main process.

> [2] yes, I already hear json. A line deliminated format would have some
> advantages though.

We use JSON, but each protocol request/response is linefeed-delimited.
So for example here's what it looks like when the main process requests
a local process to backup a specific file:

{"{"cmd":"backupFile","param":["base/32768/33001",true,65536,null,true,0,"pg_data/base/32768/33001",false,0,3,"20200412-213313F",false,null]}"}

And the local responds with:

{"{"out":[1,65536,65536,"6bf316f11d28c28914ea9be92c00de9bea6d9a6b",{"align":true,"error":[0,[3,5],7],"valid":false}]}"}

We use arrays for parameters but of course these could be done with
objects for more readability.

We are considering a move to HTTP since lots of services (e.g. S3, GCS,
Azure, etc.) require it (so we implement it) and we're not sure it makes
sense to maintain our own protocol format. That said, we'd still prefer
to use JSON for our payloads (like GCS) rather than XML (as S3 does).

Regards,
--
-David
david(at)pgmasters(dot)net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2020-04-12 22:24:15 Re: cleaning perl code
Previous Message Justin Pryzby 2020-04-12 21:35:45 Re: doc review for v13