Re: Btrfs clone WIP patch

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Jonathan Rogers <jrogers(at)socialserve(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Btrfs clone WIP patch
Date: 2013-03-09 17:30:15
Message-ID: 513B71A7.6080506@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/1/13 1:40 AM, Jonathan Rogers wrote:
> I've been thinking about both of these issues and decided to try a
> different approach. This patch adds GUC options for two external
> commands

This is a reasonable approach for a proof of concept patch. I like the
idea you're playing with here, as a useful boost for both production
deployments and development synchronization jobs. There are some major
coding hurdles for this idea to go beyond proof of concept into
committed feature though, so I'm just going to dump all the obvious ones
I can see on you too. Feel free to ignore those and focus on the fun
usability / capability parts for a while first. Just want you to
realize what work is in your way, but not visible to you necessarily.

With so many GUCs already, adding more of them to support something with
a fairly narrow user base will be hard to justify committing. This sort
of thing seems like you could load it as an extension instead. I don't
think this is worth doing yet. For now the GUC approach is fine, and it
keeps your code sample small.

I'm going to start with how to continue validating this idea is useful
without worrying about those, to get some more testing/benchmarking
samples, and for all of that job what you're doing so far is fine. The
road toward something to commit is going to take a lot more complicated
code eventually though. There's no big conceptual leap involved, it's
just where the changes need to go and how much they need to touch to do
this accurately in all cases isn't as simple as your demo. Handling
unusual situations and race conditions turns out to be almost all the
code in what you're replacing, and you'll need similar work in the
replacements.

> I have just been experimenting with ZFS and it does not seem to have any
> capability or interface for cloning ordinary files or directories so the
> configuration is not as straightforward. However, I was able to set up a
> Postgres cluster as a hierarchy of ZFS file systems in the same pool
> with each directory under "base" being a separate file system and
> configure Postgres to call shell scripts which call zfs snapshot and
> clone commands to do the cloning and deleting.

To make things easier for a reviewer, it's useful to include working
examples like this as part of your submission. For this one that would
include:

-A sample script that created a test pool
-Working postgresql.conf changes compatible with ZFS
-Test program showing how to exercise the feature

I think I can see how to construct such an example for the btrfs
version, but having you show that explicitly (preferably with a whole
sample session executing it) will also help reviewers. Remember: if
you want to get your submission off to a good start, the reviewer should
be able to run your sample test, see the code work, and do something fun
within a few seconds of compiling it. Make that easy for them, and your
reviewer will start with a good impression of you and a positive outlook
for the change.

Now onto the code nitpicking!

= Extension vs. GUC =

In addition to not polluting the postgresql.conf.sample, there's another
reason this might make for better extension material eventually. Adding
new execv calls that are building strings like this is just generally a
risky thing. It would be nice from a security perspective if that
entire mechanism wasn't even introduced into the server at all unless
someone loaded the extension.

An extension implementation will end up being more code, both to add a
useful hook for replace these calls and for the extension packaging
itself. Having a bit more code in contrib/ but less in src &
postgresql.conf is probably a net positive trade though.

= Diff reduction / code refactoring =

Looks like you added a "File Operation Options" entry into guc.c but
then not use it here? I would just keep this in an existing category
for now, try to reduce the diff length of the proof of concept version
as much as possible in the beginning.

On the topic of smaller diffs, the similar cut and paste sections of the
two entries that both do fork/exec/waitpid should really be refactored
into one function. The archiver does something similar for running
archive_command, there may be some reuse or refactoring of its code to
present this interface.

Again, this sort of refactoring is not necessary as a POC patch. But it
will probably come up if this moves toward commit candidate.

= Recursion and directory navigation =

> In either case, the directories are copied recursively while the
> Postgres internal copydir function does not recurse. I don't think that
> should be a problem since there shouldn't be nested directories in the
> first place.

copydir takes an option for whether it should recurse or not.

The rm side of makes me twitch for a number of reasons. First off,
there's just the general scariness of the concept of shelling out to run
rm recursively with some text string you build. The worst time I saw a
bug in that sort of code destroyed a terabyte, and the last time I saw
such a bug was only a week ago. Validation you're doing the right thing
is always job #1 in removing files.

Specifically here, take a look at src/port/dirmod.c for a) its comments
on race conditions and b) how it does error handling. The external rm
will need to duplicate all of that. I don't see how you could possibly
replace rmtree with something simplier that has the same necessary
properties. I think what you're actually going to need is a hook to
replace both the unlink and rmdir calls with your alternate
implementation idea.

The same sort of issue is in your external_copydir. Iterating into
subdirectories when it doesn't happen now just isn't safe, even though
the one expected case you're hooking won't be any different. You really
can't just do that. Would this work instead, and is there any concern
about files that start with a "."?

"cp * --reflink=auto"

Regardless, you need to keep most of the structure to copydir anyway.
Error handling, handling cancellation, and fsync calls are all vital
things. You probably have to make the forked command copy a single file
at a time to get the same interrupt handling behavior.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2013-03-09 17:42:37 Re: Call for Google Summer of Code mentors, admins
Previous Message Simon Riggs 2013-03-09 16:04:49 Re: Enabling Checksums