Re: PATCH: Configurable file mode mask

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: David Steele <david(at)pgmasters(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Adam Brightwell <adam(dot)brightwell(at)crunchydata(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: Configurable file mode mask
Date: 2018-04-06 13:15:15
Message-ID: 20180406131514.GO27724@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Michael Paquier (michael(at)paquier(dot)xyz) wrote:
> On Thu, Apr 05, 2018 at 12:08:15PM -0400, David Steele wrote:
> > On 4/5/18 2:55 AM, Michael Paquier wrote:
> >> On Wed, Apr 04, 2018 at 08:03:54PM -0400, David Steele wrote:
> >>
> >>> Instead I have created variables in file_perm.c
> >>> that hold the current file create mode, dir create mode, and mode mask.
> >>> All call sites use those variables (e.g. pg_dir_create_mode), which I
> >>> think are much clear.
> >>
> >> Hm. I personally find that even more confusing, especially with
> >> SetDataDirectoryCreatePerm which basically is the same as
> >> SetConfigOption for the backend.
> >
> > The GUC shows the current mode of the data directory, while the
> > variables in file_perm.c store the mode that should be used to create
> > new dirs/files. One is certainly based on the other but I thought it
> > best to split them for clarity.
>
> Yeah, there are arguments to actually have both things split so as they
> can be concatenated. This makes the code more modular.

I prefer having them split as well.

> >> You also forgot a call to SetDataDirectoryCreatePerm or
> >> pg_dir_create_mode remains to its default.
>
> My apologies, I forgot the last two words of this sentence: "for
> pg_rewind". Still, I am wrong because GetDataDirectoryCreatePerm calls
> internally SetDataDirectoryCreatePerm. So the API name is confusing
> because not only the permissions are fetched, but they are also
> applied.

They're not *actually* applied though- that just sets the variable, and
having GetDataDirectoryCreatePerm get what the perm is and then set a
variable based off of that, so we know what it's set to later, seems
reasonable to me.

> >> The interface of file_perm.h that you are introducing is not confusing
> >> anymore though..
> >
> > Yes, that was the idea. I think it makes it clearer what we are trying
> > to do and centralizes variables related to create modes in one place.
> >
> > I've attached new patches that exclude Windows from permissions tests
> > and deal with bootstrap permissions in a better way. PostgresMain() and
> > AuxiliaryProcessMain() now use checkDataDir() to set permissions.
>
> GetDataDirectoryCreatePerm() is defined in file_perm.h but it is only
> declared for frontends.

Right, the job that GetDataDirectoryCreatePerm() has in the front-ends
is more complicated when the backend is starting up because we're also
checking who the data directory is owned by and we're setting the
internal GUC, all of which is done by checkDataDir(), which also calls
SetDataDirectoryCreatePerm() to set the variables.

> At the end, this patch brings in a new abstraction concept to
> src/common/ to control a set of pre-determined behaviors:
> - The mode to create directories.
> - The mode to create files.
> - The mode number which can be used for umask.
> I am not really convinced that we need to enforce all of them all the
> time with a API layer aimed at controlling the behavior of all things at
> the same time. Getting this abstraction down one level by allowing each
> frontend to set up things the way they want would be nicer in my
> opinion. Perhaps others feel differently...

I don't think we actually *want* the various tools making different
decisions about what permissions the files in a given data directory
have- that's initdb's job, not the job of pg_resetwal.

> It could be also less confusing if the set of variables in file_perm.h
> was designed in such a way that we have the default, and then we can
> apply on top of it the set to allow grouping, so as allowing grouping
> access would be to do the operation of (default mask + group addition).

This seems like we're going around-and-around. We had all the different
masking things happening previously, but that got rather ugly as there
were just small variations between "right" and "wrong". I like this
approach which makes it pretty clear that when we start up, we figure
out what the perms should be for files in this data dir, and then we set
the variables for the permissions and use them.

> The design on the backend is rather neat however there is also
> overlapping with SetDataDirectoryCreatePerm() and the GUC
> data_directory_mode which are heading toward the same types of goals.

Just above we discuss how we want those to be seperate, and here it
seems you're asking for them to be put together. We really can't have
it both ways in this case and I tend to agree with the above discussion
where we keep them seperate.

> We could reduce that confusion by designing the interface as follows:
> - Have read-only GUCs for the directory and file masks on the backend
> which get set by the postmaster after looking at the permission of the
> data folder at startup. Then if a file or a folder needs to be created,
> then look at those values and apply permissions as granted. And also a
> GUC to decide the umask to apply but that should not be necessary,
> right?

This is more-or-less what we've got now- a read-only GUC for the
directory and file masks on the backend which gets set by the postmaster
after looking at the permission of the data folder on startup- that all
happens in checkDataDir() now, which looks good to me. We then have the
variables set by SetDataDirectoryCreatePerm() which happens just before
the GUC is set in the same area of checkDataDir(). We need to set them
both if we're going to keep them seperate, which I believe is what we
really want here.

> - Frontends are responsible for the decision-making of the permissions.

No. Frontends need to follow what the permissions are on the data
directory- having them decide independently will just lead to things
being inconsistent and that's definitely not something that we want.

> Not all the variables are used for frontends as well. For example
> pg_resetwal and pg_upgrade only touch files.

While true today, I don't think it's an issue and I'd rather keep the
directory and file privilege settings in the same place (where else
would you put the directory ones?).

> - For frontends, there are two cases:
> -- The client needs to connect to a live Postgres instance, in which
> case it can use the SHOW command to get the wanted information. This
> applies to pg_rewind with the remote mode (should apply to the second
> case actually), pg_basebackup, pg_receivexlog, etc.

Right, that's what pg_basebackup, pg_receivewal, etc, do now with this
patch..

> -- Binaries work on a local data folder, so permissions can be guessed
> from that: pg_rewind, pg_resetwal and pg_upgrade. Having an API in
> src/common/ which scans for what to apply is neat. This was in v12 and
> some older versions if I recall correctly.

I'm confused here as that's what GetDataDirectoryCreatePerm()
specifically does and that's why it's in src/common now for the
front-end tools to use, or were you just agreeing that this was in v12
previously and you're happy to see that it's back..?

> We are two days away from the end of the commit fest, and this patch set
> if not yet in a committable stagle, so perhaps at this stage we had
> better just retreat and come back to it in next cycle?

I've been watching this and discussing things with David while you and
he have been working on it and I think the discussion has generally been
good, so I didn't want to step in and disrupt it, but there's a few
things here which now seem to be going in circles without a lot of
benefit. There's a few comments which I have on the patch after going
over it again, but I tend to feel it's actually pretty close as none of
the comments I have at this point are serious issues- you and David have
addressed those (I'm *very* glad that the pg_dump changes were ripped
out, for instance, and having the some of the regression tests committed
independently was certainly good...) and I don't see any of the above as
being really concerning, but do let me know if you think there's
something I'm missing or some other issue.

I'll reply to David's last email (where the latest set of patches were
included) with my comments/suggestions and I expect we'll be able to get
those addressed today and have a final patch to post tonight, with an
eye towards committing it tomorrow.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2018-04-06 13:23:32 Re: pgsql: New files for MERGE
Previous Message Julian Markwort 2018-04-06 13:03:30 Re: [FEATURE PATCH] pg_stat_statements with plans (v02)