Re: Column Filtering in Logical Replication

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Rahila Syed <rahilasyed90(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Column Filtering in Logical Replication
Date: 2021-12-18 01:59:00
Message-ID: 670fbd5b-5fc3-8c30-a7bf-ad1d6ea2d4a5@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/18/21 02:34, Alvaro Herrera wrote:
> On 2021-Dec-17, Tomas Vondra wrote:
>
>> On 12/17/21 22:07, Alvaro Herrera wrote:
>>> So I've been thinking about this as a "security" item (you can see my
>>> comments to that effect sprinkled all over this thread), in the sense
>>> that if a publication "hides" some column, then the replica just won't
>>> get access to it. But in reality that's mistaken: the filtering that
>>> this patch implements is done based on the queries that *the replica*
>>> executes at its own volition; if the replica decides to ignore the list
>>> of columns, it'll be able to get all columns. All it takes is an
>>> uncooperative replica in order for the lot of data to be exposed anyway.
>>
>> Interesting, I haven't really looked at this as a security feature. And in
>> my experience if something is not carefully designed to be secure from the
>> get go, it's really hard to add that bit later ...
>
> I guess the way to really harden replication is to use the GRANT system
> at the publisher's side to restrict access for the replication user.
> This would provide actual security. So you're right that I seem to be
> barking at the wrong tree ... maybe I need to give a careful look at
> the documentation for logical replication to understand what is being
> offered, and to make sure that we explicitly indicate that limiting the
> column list does not provide any actual security.
>
>> You say it's the replica making the decisions, but my mental model is it's
>> the publisher decoding the data for a given list of publications (which
>> indeed is specified by the subscriber). But the subscriber can't tweak the
>> definition of publications, right? Or what do you mean by queries executed
>> by the replica? What are the gap?
>
> I am thinking in somebody modifying the code that the replica runs, so
> that it ignores the column list that the publication has been configured
> to provide; instead of querying only those columns, it would query all
> columns.
>
>>> If the server has a *separate* security mechanism to hide the columns
>>> (per-column privs), it is that feature that will protect the data, not
>>> the logical-replication-feature to filter out columns.
>>
>> Right. Although I haven't thought about how logical decoding interacts with
>> column privileges. I don't think logical decoding actually checks column
>> privileges - I certainly don't recall any ACL checks in
>> src/backend/replication ...
>
> Well, in practice if you're confronted with a replica that's controlled
> by a malicious user that can tweak its behavior, then replica-side
> privilege checking won't do anything useful.
>

I don't follow. Surely the decoding happens on the primary node, right?
Which is where the ACL checks would happen, using the role the
replication connection is opened with.

>>> This led me to realize that the replica-side code in tablesync.c is
>>> totally oblivious to what's the publication through which a table is
>>> being received from in the replica. So we're not aware of a replica
>>> being exposed only a subset of columns through some specific
>>> publication; and a lot more hacking is needed than this patch does, in
>>> order to be aware of which publications are being used.
>
>> Does that mean we currently sync all the columns in the initial sync, and
>> only start filtering columns later while decoding transactions?
>
> No, it does filter the list of columns in the initial sync. But the
> current implementation is bogus, because it obtains the list of *all*
> publications in which the table is published, not just the ones that the
> subscription is configured to get data from. And the sync code doesn't
> receive the list of publications. We need more thorough patching of the
> sync code to close that hole.

Ah, got it. Thanks for the explanation. Yeah, that makes no sense.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-12-18 02:29:21 Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Previous Message Tomas Vondra 2021-12-18 01:56:26 Re: logical decoding and replication of sequences