Re: row filtering for logical replication

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>, Euler Taveira <euler(at)timbira(dot)com(dot)br>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2018-12-15 14:13:41
Message-ID: 74e4d796-20a9-21fd-0aba-6c0be6d10226@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14/12/2018 16:38, Stephen Frost wrote:
> Greetings,
>
> * Petr Jelinek (petr(dot)jelinek(at)2ndquadrant(dot)com) wrote:
>> On 23/11/2018 03:02, Stephen Frost wrote:
>>> * Euler Taveira (euler(at)timbira(dot)com(dot)br) wrote:
>>>> 2018-02-28 21:54 GMT-03:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:
>>>>> Good idea. I haven't read this yet, but one thing to make sure you've
>>>>> handled is limiting the clause to referencing only the current tuple and the
>>>>> catalogs. user-catalog tables are OK, too, anything that is
>>>>> RelationIsAccessibleInLogicalDecoding().
>>>>>
>>>>> This means only immutable functions may be invoked, since a stable or
>>>>> volatile function might attempt to access a table. And views must be
>>>>> prohibited or recursively checked. (We have tree walkers that would help
>>>>> with this).
>>>>>
>>>>> It might be worth looking at the current logic for CHECK expressions, since
>>>>> the requirements are similar. In my opinion you could safely not bother with
>>>>> allowing access to user catalog tables in the filter expressions and limit
>>>>> them strictly to immutable functions and the tuple its self.
>>>>
>>>> IIRC implementation is similar to RLS expressions. I'll check all of
>>>> these rules.
>>>
>>> Given the similarity to RLS and the nearby discussion about allowing
>>> non-superusers to create subscriptions, and probably publications later,
>>> I wonder if we shouldn't be somehow associating this with RLS policies
>>> instead of having the publication filtering be entirely independent..
>>
>> I do see the appeal here, if you consider logical replication to be a
>> streaming select it probably applies well.
>>
>> But given that this is happening inside output plugin which does not
>> have full executor setup and has catalog-only snapshot I am not sure how
>> feasible it is to try to merge these two things. As per my previous
>> email it's possible that we'll have to be stricter about what we allow
>> in expressions here.
>
> I can certainly understand the concern about trying to combine the
> implementation of this with that of RLS; perhaps that isn't a good fit
> due to the additional constraints put on logical decoding.
>
> That said, I still think it might make sense to consider these filters
> for logical decoding to be policies and, ideally, to allow users to use
> the same policy for both.
>

I am not against that as long as it's possible to have policy for
logical replication without having it for RLS and vice versa.

I also wonder if policies are flexible enough to allow for specifying
OLD and NEW - the replication filtering deals with DML, not with what's
visible, it might very well depend on differences between these (that's
something the current patch is missing as well BTW).

> In the end, the idea of having to build a single large and complex
> 'create publication' command which has a bunch of tables, each with
> their own filter clauses, just strikes me as pretty painful.
>
>> The other issue with merging this is that the use-case for filtering out
>> the data in logical replication is not necessarily about security, but
>> often about sending only relevant data. So it makes sense to have filter
>> on publication without RLS enabled on table and if we'd force that, we'd
>> limit usefulness of this feature.
>
> I definitely have a serious problem if we are going to say that you
> can't use this filtering for security-sensitive cases.

I am saying it should not be tied to only security sensitive cases,
because it has use cases that have nothing to do with security (ie, I
don't want this to depend on RLS being enabled for a table).

>
>> We definitely want to eventually create subscriptions as non-superuser
>> but that has zero effect on this as everything here is happening on
>> different server than where subscription lives (we already allow
>> creation of publications with just CREATE privilege on database and
>> ownership of the table).
>
> What I wasn't clear about above was the idea that we might allow a user
> other than the table owner to publish a given table, but that such a
> publication should certanily only be allowed to include the rows which
> that user has access to- as regulated by RLS. If the RLS policy is too
> complex to allow that then I would think we'd simply throw an error at
> the create publication time and the would-be publisher would need to
> figure that out with the table owner.

My opinion is that this is useful, but not necessarily something v1
patch needs to solve. Having too many publications and subscriptions to
various places is not currently practical anyway due to decoding
duplicating all the work for every connection.

>
> * Euler Taveira (euler(at)timbira(dot)com(dot)br) wrote:
>> Em sex, 23 de nov de 2018 às 11:40, Petr Jelinek
>> <petr(dot)jelinek(at)2ndquadrant(dot)com> escreveu:
>
>>> The other issue with merging this is that the use-case for filtering out
>>> the data in logical replication is not necessarily about security, but
>>> often about sending only relevant data. So it makes sense to have filter
>>> on publication without RLS enabled on table and if we'd force that, we'd
>>> limit usefulness of this feature.
>>
>> Use the same infrastructure as RLS could be a good idea but use RLS
>> for row filtering is not. RLS is complex.
>
> Right, this was along the lines I was thinking of- using the
> infrastructure and the policy system, in particular.
>

Yeah that part is definitely worth investigating.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2018-12-15 14:23:39 Re: row filtering for logical replication
Previous Message Alvaro Herrera 2018-12-15 13:21:16 Re: removal of dangling temp tables