From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: refactoring basebackup.c |
Date: | 2020-05-12 08:31:59 |
Message-ID: | CAFiTN-v9-xwjTKFiPUXCugmy5dZhBhvkcUqEwiBW35sXh0V1DQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, May 9, 2020 at 2:23 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> Hi,
>
> I'd like to propose a fairly major refactoring of the server's
> basebackup.c. The current code isn't horrific or anything, but the
> base backup mechanism has grown quite a few features over the years
> and all of the code knows about all of the features. This is going to
> make it progressively more difficult to add additional features, and I
> have a few in mind that I'd like to add, as discussed below and also
> on several other recent threads.[1][2] The attached patch set shows
> what I have in mind. It needs more work, but I believe that there's
> enough here for someone to review the overall direction, and even some
> of the specifics, and hopefully give me some useful feedback.
>
> This patch set is built around the idea of creating two new
> abstractions, a base backup sink -- or bbsink -- and a base backup
> archiver -- or bbarchiver. Each of these works like a foreign data
> wrapper or custom scan or TupleTableSlot. That is, there's a table of
> function pointers that act like method callbacks. Every implementation
> can allocate a struct of sufficient size for its own bookkeeping data,
> and the first member of the struct is always the same, and basically
> holds the data that all implementations must store, including a
> pointer to the table of function pointers. If we were using C++,
> bbarchiver and bbsink would be abstract base classes.
>
> They represent closely-related concepts, so much so that I initially
> thought we could get by with just one new abstraction layer. I found
> on experimentation that this did not work well, so I split it up into
> two and that worked a lot better. The distinction is this: a bbsink is
> something to which you can send a bunch of archives -- currently, each
> would be a tarfile -- and also a backup manifest. A bbarchiver is
> something to which you send every file in the data directory
> individually, or at least the ones that are getting backed up, plus
> any that are being injected into the backup (e.g. the backup_label).
> Commonly, a bbsink will do something with the data and then forward it
> to a subsequent bbsink, or a bbarchiver will do something with the
> data and then forward it to a subsequent bbarchiver or bbsink. For
> example, there's a bbarchiver_tar object which, like any bbarchiver,
> sees all the files and their contents as input. The output is a
> tarfile, which gets send to a bbsink. As things stand in the patch set
> now, the tar archives are ultimately sent to the "libpq" bbsink, which
> sends them to the client.
>
> In the future, we could have other bbarchivers. For example, we could
> add "pax", "zip", or "cpio" bbarchiver which produces archives of that
> format, and any given backup could choose which one to use. Or, we
> could have a bbarchiver that runs each individual file through a
> compression algorithm and then forwards the resulting data to a
> subsequent bbarchiver. That would make it easy to produce a tarfile of
> individually compressed files, which is one possible way of creating a
> seekable achive.[3] Likewise, we could have other bbsinks. For
> example, we could have a "localdisk" bbsink that cause the server to
> write the backup somewhere in the local filesystem instead of
> streaming it out over libpq. Or, we could have an "s3" bbsink that
> writes the archives to S3. We could also have bbsinks that compresses
> the input archives using some compressor (e.g. lz4, zstd, bzip2, ...)
> and forward the resulting compressed archives to the next bbsink in
> the chain. I'm not trying to pass judgement on whether any of these
> particular things are things we want to do, nor am I saying that this
> patch set solves all the problems with doing them. However, I believe
> it will make such things a whole lot easier to implement, because all
> of the knowledge about whatever new functionality is being added is
> centralized in one place, rather than being spread across the entirety
> of basebackup.c. As an example of this, look at how 0010 changes
> basebackup.c and basebackup_tar.c: afterwards, basebackup.c no longer
> knows anything that is tar-specific, whereas right now it knows about
> tar-specific things in many places.
>
> Here's an overview of this patch set:
>
> 0001-0003 are cleanup patches that I have posted for review on
> separate threads.[4][5] They are included here to make it easy to
> apply this whole series if someone wishes to do so.
>
> 0004 is a minor refactoring that reduces by 1 the number of functions
> in basebackup.c that know about the specifics of tarfiles. It is just
> a preparatory patch and probably not very interesting.
>
> 0005 invents the bbsink abstraction.
>
> 0006 creates basebackup_libpq.c and moves all code that knows about
> the details of sending archives via libpq there. The functionality is
> exposed for use by basebackup.c as a new type of bbsink, bbsink_libpq.
>
> 0007 creates basebackup_throttle.c and moves all code that knows about
> throttling backups there. The functionality is exposed for use by
> basebackup.c as a new type of bbsink, bbsink_throttle. This means that
> the throttling logic could be reused to throttle output to any final
> destination. Essentially, this is a bbsink that just passes everything
> it gets through to the next bbsink, but with a rate limit. If
> throttling's not enabled, no bbsink_throttle object is created, so all
> of the throttling code is completely out of the execution pipeline.
>
> 0008 creates basebackup_progress.c and moves all code that knows about
> progress reporting there. The functionality is exposed for use by
> basebackup.c as a new type of bbsink, bbsink_progress. Since the
> abstraction doesn't fit perfectly in this case, some extra functions
> are added to work around the problem. This is not entirely elegant,
> but I don't think it's still an improvement over what we have now, and
> I don't have a better idea.
>
> 0009 invents the bbarchiver abstraction.
>
> 0010 invents two new bbarchivers, a tar bbarchiver and a tarsize
> bbarchiver, and refactors basebackup.c to make use of them. The tar
> bbarchiver puts the files it sees into tar archives and forwards the
> resulting archives to a bbsink. The tarsize bbarchiver is used to
> support the PROGRESS option to the BASE_BACKUP command. It just
> estimates the size of the backup by summing up the file sizes without
> reading them. This approach is good for a couple of reasons. First,
> without something like this, it's impossible to keep basebackup.c from
> knowing something about the tar format, because the PROGRESS option
> doesn't just figure out how big the files to be backed up are: it
> figures out how big it thinks the archives will be, and that involves
> tar-specific considerations. This area needs more work, as the whole
> idea of measuring progress by estimating the archive size is going to
> break down as soon as server-side compression is in the picture.
> Second, this makes the code path that we use for figuring out the
> backup size details much more similar to the path we use for
> performing the actual backup. For instance, with this patch, we
> include the exact same files in the calculation that we will include
> in the backup, and in the same order, something that's not true today.
> The basebackup_tar.c file added by this patch is sadly lacking in
> comments, which I will add in a future version of the patch set. I
> think, though, that it will not be too unclear what's going on here.
>
> 0011 invents another new kind of bbarchiver. This bbarchiver just
> eavesdrops on the stream of files to facilitate backup manifest
> construction, and then forwards everything through to a subsequent
> bbarchiver. Like bbsink_throttle, it can be entirely omitted if not
> used. This patch is a bit clunky at the moment and needs some polish,
> but it is another demonstration of how these abstractions can be used
> to simplify basebackup.c, so that basebackup.c only has to worry about
> determining what should be backed up and not have to worry much about
> all the specific things that need to be done as part of that.
>
> Although this patch set adds quite a bit of code on net, it makes
> basebackup.c considerably smaller and simpler, removing more than 400
> lines of code from that file, about 20% of the current total. There
> are some gratifying changes vs. the status quo. For example, in
> master, we have this:
>
> sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
> bool sendtblspclinks, backup_manifest_info *manifest,
> const char *spcoid)
>
> Notably, the sizeonly flag makes the function not do what the name of
> the function suggests that it does. Also, we've got to pass some extra
> fields through to enable specific features. With the patch set, the
> equivalent function looks like this:
>
> archive_directory(bbarchiver *archiver, const char *path, int basepathlen,
> List *tablespaces, bool sendtblspclinks)
>
> The question "what should I do with the directories and files we find
> as we recurse?" is now answered by the choice of which bbarchiver to
> pass to the function, rather than by the values of sizeonly, manifest,
> and spcoid. That's not night and day, but I think it's better,
> especially as you imagine adding more features in the future. The
> really important part, for me, is that you can make the bbarchiver do
> anything you like without needing to make any more changes to this
> function. It just arranges to invoke your callbacks. You take it from
> there.
>
> One pretty major question that this patch set doesn't address is what
> the user interface for any of the hypothetical features mentioned
> above ought to look like, or how basebackup.c ought to support them.
> The syntax for the BASE_BACKUP command, like the contents of
> basebackup.c, has grown organically, and doesn't seem to be very
> scalable. Also, the wire protocol - a series of CopyData results which
> the client is entirely responsible for knowing how to interpret and
> about which the server provides only minimal information - doesn't
> much lend itself to extensibility. Some careful design work is likely
> needed in both areas, and this patch does not try to do any of it. I
> am quite interested in discussing those questions, but I felt that
> they weren't the most important problems to solve first.
>
> What do you all think?
The overall idea looks quite nice. I had a look at some of the
patches at least 0005 and 0006. At first look, I have one comment.
+/*
+ * Each archive is set as a separate stream of COPY data, and thus begins
+ * with a CopyOutResponse message.
+ */
+static void
+bbsink_libpq_begin_archive(bbsink *sink, const char *archive_name)
+{
+ SendCopyOutResponse();
+}
Some of the bbsink_libpq_* functions are directly calling pq_* e.g.
bbsink_libpq_begin_backup whereas others are calling SendCopy*
functions and therein those are calling pq_* functions. I think
bbsink_libpq_* function can directly call pq_* functions instead of
adding one more level of the function call.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2020-05-12 09:35:27 | Re: Problem with logical replication |
Previous Message | Fabien COELHO | 2020-05-12 07:43:10 | Re: PG 13 release notes, first draft |