From: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
---|---|
To: | Stephen Frost <sfrost(at)snowman(dot)net> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Allow replication roles to use file access functions |
Date: | 2015-09-03 02:05:24 |
Message-ID: | CAB7nPqRudM6GScUj3_Kvxev+JOPTKtG_k1d316=mzMed_+cMeA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Sep 3, 2015 at 10:05 AM, Stephen Frost wrote:
> * Michael Paquier (michael(dot)paquier(at)gmail(dot)com) wrote:
>> On Thu, Sep 3, 2015 at 4:52 AM, Alvaro Herrera wrote:
> The replication role already has read-only access to everything
> (nearly?) in the PGDATA directory. The specific issue in this case, I
> believe, is if the functions mentioned provide access beyond what the
> replication role already has access to.
It seems to me that this difference of files is symbolized by what is
filtered out in a base backup, like postmaster.opts or postmaster.pid.
The matter here is: are we going to have in the future files that we
do not want to include in a base backup that are superuser-only? Today
I cannot recall we have any of that, we may have in the future though.
>> I don't entirely understand why pg_rewind should be invoking any of these
>> functions to begin with. Isn't it working directly with the disk data,
>> with both postmasters shut down?
>
> Certain information is needed from the new master to rewind the old
> master. What I would hope is that we'd provide a way for *just* that
> information to be made available to the replication role rather than
> completely general access through the functions mentioned.
The rewound node does not only need the diffs of blocks from a source,
but also some more information like clog. There are actually three
different approaches to solve that:
1) Use a differential backup to me, or the possibility to fetch a set
of data block diffs from a source node using an LSN and then re-apply
them on the target node. The major disadvantage of this approach is
actually performance: we would need to scan the source node
completely, so that's slow. And pg_rewind is fast because it only
scans the WAL records of the node to be rewound from the last
checkpoint before WAL forked.
2) Request data blocks from the source node using the replication
protocol, that's what pg_read_binary_file actually does, we just don't
have the logic on replication protocol side, though we could.
3) Create a new set of functions similar to the existing file access
functions that are usable by the replication user, except that they
refuse to return file entries that match the existing filters in
basebackup.c. That is doable, with a routine in basebackup.c that
decides if a given file string can be read or not. Base backups could
use this routine as well.
I just came up with 3), which looks rather appealing to me.
>> > Another problem with it is that access to the filesystem is about halfway
>> > to being superuser anyway, even if it's only read-only access. It would
>> > certainly let you read things that one would not expect a replication
>> > user to be able to read (like files unrelated to Postgres).
>>
>> Doesn't pg_read_file et al insist that the path is below the data
>> directorY?
>
> I don't believe that's the case, actually.. I might be wrong though,
> I'm not looking at the code right now.
The use of PGDATA is enforced
=# select pg_read_file('/etc/passwd');
ERROR: 42501: absolute path not allowed
LOCATION: convert_and_check_filename, genfile.c:73
If PGDATA includes soft links it is possible to point to those places
though, but that's not something anybody sane would do, still it can
be done and I've seen worse:
$ cd $PGDATA && ln -s /etc etc
$ psql -c 'select pg_read_file('etc/passwd')'
# Adult content, not an ERROR
From this perspective allowing a replication user to have access to
those functions is dangerous, a superuser being the equivalent of an
all-mightly god on a server, still that's not the case of the
replication user.
> This is definitely a big part of
> the question, but I'd like to ask- what, exactly, does pg_rewind
> actually need access to? Is it only the WAL, or are heap and WAL files
> needed?
Not only, +clog, configuration files, etc.
> Consider the discussion about delta backups, et al, using LSNs. Perhaps
> the replication protocol should be extended to allow access to arbitrary
> WAL, querying what WAL is available, and access to the heap files,
> perhaps even specific pages in the heap files and relation forks,
> instead of giving pg_rewind access to these extremely general
> nearly-OS-user-level functions.
The problem when using differential backups in this case is
performance as mentioned above. We would need to scan the whole target
cluster, which may take time, the current approach of pg_rewind only
needs to scan WAL records to find the list of blocks modified, and
directly requests them from the source. I would expect pg_rewind to be
as quick as possible.
> Further, for my two cents, I view any access to the system by a
> replication role to a non-replication-protocol communication with the
> server as being inappropriate. To that point, I'd actually remove the
> ability for the replication roles to call pg_start/stop_backup as normal
> SQL calls since that implies the replication user has connected to a
> normal backend instead of talking the replication protocol. I'm not
> sure if we can really put that cat back in the bag, but I'd be a lot
> happier if we could. Allowing access to these functions outside of the
> replication protocol doubles-down on that bad idea, in my view.
That would break many existing applications, but I definitely see the
rightful concern here: basically replication roles should just use the
replication protocol and have no access to directly to SQL.
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2015-09-03 02:20:36 | Re: Allow replication roles to use file access functions |
Previous Message | Jim Nasby | 2015-09-03 01:46:21 | Re: Fwd: Core dump with nested CREATE TEMP TABLE |