Re: where should I stick that backup?

From: David Steele <david(at)pgmasters(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: where should I stick that backup?
Date: 2020-04-12 21:19:00
Message-ID: f6d3048d-99a1-8258-23d1-db8a9fa93506@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/12/20 11:04 AM, Robert Haas wrote:
> On Sun, Apr 12, 2020 at 10:09 AM Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> There are certainly cases for it. It might not be they have to be the same connection, but still be the same session, meaning before the first time you perform some step of authentication, get a token, and then use that for all the files. You'd need somewhere to maintain that state, even if it doesn't happen to be a socket. But there are definitely plenty of cases where keeping an open socket can be a huge performance gain -- especially when it comes to not re-negotiating encryption etc.
>
> Hmm, OK.

When we implemented connection-sharing for S3 in pgBackRest it was a
significant performance boost, even for large files since they must be
uploaded in parts. The same goes for files transferred over SSH, though
in this case the overhead is per-file and can be mitigated with control
master.

We originally (late 2013) implemented everything with commmand-line
tools during the POC phase. The idea was to get something viable quickly
and then improve as needed. At the time our config file had entries
something like this:

[global:command]
compress=/usr/bin/gzip --stdout %file%
decompress=/usr/bin/gzip -dc %file%
checksum=/usr/bin/shasum %file% | awk '{print $1}'
manifest=/opt/local/bin/gfind %path% -printf
'%P\t%y\t%u\t%g\t%m\t%T(at)\t%i\t%s\t%l\n'
psql=/Library/PostgreSQL/9.3/bin/psql -X %option%

[db]
psql_options=--cluster=9.3/main

[db:command:option]
psql=--port=6001

These appear to be for MacOS, but Linux would be similar.

This *did* work, but it was really hard to debug when things went wrong,
the per-file cost was high, and the slight differences between the
command-line tools on different platforms was maddening. For example,
lots of versions of 'find' would error if a file disappeared while
building the manifest, which is a pretty common occurrence in PostgreSQL
(most newer distros had an option to fix this). I know that doesn't
apply here, but it's an example. Also, debugging was complicated with so
many processes, with any degree of parallelism the process list got
pretty crazy, fsync was not happening, etc. It's been a long time but I
don't have any good memories of the solution that used all command-line
tools.

Once we had a POC that solved our basic problem, i.e. backup up about
50TB of data reasonably efficiently, we immediately started working on a
version that did not rely on command-line tools and we never looked
back. Currently the only command-line tool we use is ssh.

I'm sure it would be possible to create a solution that worked better
than ours, but I'm pretty certain it would still be hard for users to
make it work correctly and to prove it worked correctly.

>> For compression and encryption, it could perhaps be as simple as "the command has to be pipe on both input and output" and basically send the response back to pg_basebackup.
>>
>> But that won't help if the target is to relocate things...
>
> Right. And, also, it forces things to be sequential in a way I'm not
> too happy about. Like, if we have some kind of parallel backup, which
> I hope we will, then you can imagine (among other possibilities)
> getting files for each tablespace concurrently, and piping them
> through the output command concurrently. But if we emit the result in
> a tarfile, then it has to be sequential; there's just no other choice.
> I think we should try to come up with something that can work in a
> multi-threaded environment.
>
>> That is one way to go for it -- and in a case like that, I'd suggest the shellscript interface would be an implementation of the other API. A number of times through the years I've bounced ideas around for what to do with archive_command with different people (never quite to the level of "it's time to write a patch"), and it's mostly come down to some sort of shlib api where in turn we'd ship a backwards compatible implementation that would behave like archive_command. I'd envision something similar here.
>
> I agree. Let's imagine that there are a conceptually unlimited number
> of "targets" and "filters". Targets and filters accept data via the
> same API, but a target is expected to dispose of the data, whereas a
> filter is expected to pass it, via that same API, to a subsequent
> filter or target. So filters could include things like "gzip", "lz4",
> and "encrypt-with-rot13", whereas targets would include things like
> "file" (the thing we have today - write my data into some local
> files!), "shell" (which writes my data to a shell command, as
> originally proposed), and maybe eventually things like "netbackup" and
> "s3". Ideally this will all eventually be via a loadable module
> interface so that third-party filters and targets can be fully
> supported, but perhaps we could consider that an optional feature for
> v1. Note that there is quite a bit of work to do here just to
> reorganize the code.
>
> I would expect that we would want to provide a flexible way for a
> target or filter to be passed options from the pg_basebackup command
> line. So one might for example write this:
>
> pg_basebackup --filter='lz4 -9' --filter='encrypt-with-rot13
> rotations=2' --target='shell ssh rhaas(at)depository pgfile
> create-exclusive - %f.lz4'
>
> The idea is that the first word of the filter or target identifies
> which one should be used, and the rest is just options text in
> whatever form the provider cares to accept them; but with some
> %<character> substitutions allowed, for things like the file name.
> (The aforementioned escaping problems for things like filenames with
> spaces in them still need to be sorted out, but this is just a sketch,
> so while I think it's quite solvable, I am going to refrain from
> proposing a precise solution here.)

This is basically the solution we have landed on after many iterations.

We implement two types of filters, In and InOut. The In filters process
data and produce a result, e.g. SHA1, size, page checksum, etc. The
InOut filters modify data, e.g. compression, encryption. Yeah, the names
could probably be better...

I have attached our filter interface (filter.intern.h) as a concrete
example of how this works.

We call 'targets' storage and have a standard interface for creating
storage drivers. I have also attached our storage interface
(storage.intern.h) as a concrete example of how this works.

Note that for just performing backup this is overkill, but once you
consider verify this is pretty much the minimum storage interface
needed, according to our experience.

Regards,
--
-David
david(at)pgmasters(dot)net

Attachment Content-Type Size
filter.intern.h text/plain 5.7 KB
storage.intern.h text/plain 16.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-04-12 21:35:45 Re: doc review for v13
Previous Message David Steele 2020-04-12 20:12:59 Re: cleaning perl code