Re: bogus: logical replication rows/cols combinations

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: bogus: logical replication rows/cols combinations
Date: 2022-05-01 21:42:34
Message-ID: 338e719c-4bc8-f40a-f701-e29543a264e4@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/29/22 07:05, Amit Kapila wrote:
> On Thu, Apr 28, 2022 at 5:56 PM Peter Eisentraut
> <peter(dot)eisentraut(at)enterprisedb(dot)com> wrote:
>>
>> On 27.04.22 12:33, Amit Kapila wrote:
>>> Currently, when the subscription has multiple publications, we combine
>>> the objects, and actions of those publications. It happens for
>>> 'publish_via_partition_root', publication actions, tables, column
>>> lists, or row filters. I think the whole design works on this idea
>>> even the initial table sync. I think it might need a major change
>>> (which I am not sure about at this stage) if we want to make the
>>> initial sync also behave similar to what you are proposing.
>>
>> If one publication says "publish if insert" and another publication says
>> "publish if update", then the combination of that is clearly "publish if
>> insert or update". Similarly, if one publication says "WHERE (foo)" and
>> one says "WHERE (bar)", then the combination is "WHERE (foo OR bar)".
>>
>> But if one publication says "publish columns a and b if condition-X" and
>> another publication says "publish columns a and c if not-condition-X",
>> then the combination is clearly *not* "publish columns a, b, c if true".
>> That is not logical, in the literal sense of that word.
>>
>
> So, what should be the behavior in the below cases:
>
> Case-1:
> pub1: "publish columns a and b if condition-X"
> pub2: "publish column c if condition-X"
>
> Isn't it okay to combine these?
>

Yes, I think it's reasonable to combine those. So the whole publication
will have

WHERE (condition-X)

and the column list will be (a,b,c).

> Case-2:
> pub1: "publish columns a and b if condition-X"
> pub2: "publish columns c if condition-Y"
>

In this case the publication will have

WHERE (condition-X or condition-Y)

and there will be different column filters for different row sets:

if (condition-X and condition-Y)
=> (a,b,c)
else if (condition-X and NOT condition-Y)
=> (a,b)
else if (condition-Y and NOT condition-X)
=> (c)

I think this behavior is reasonable, and it's what the patch does.

> Here Y is subset of condition X (say something like condition-X:
> "col > 5" and condition-Y: "col1 > 10").>
> What should we do in such a case?
>
> I think if there are some cases where combining them is okay but in
> other cases, it is not okay then it is better to prohibit 'not-okay'
> cases if that is feasible.
>

Not sure I understand what's the (supposed) issue with this example.
We'll simply do this:

if (col1 > 5 and col1 > 10)
=> (a,b,c)
else if (col1 > 5 and col1 <= 10)
=> (a,b)
else if (col1 > 10 and col1 <= 5)
=> (c)

Obviously, the third branch is unreachable, because the if condition can
never be satisfied, so we can never see only column list (c). But that's
fine IMO. When phrased using the CASE expressions (as in tablesync) it's
probably somewhat less cumbersome.

I think it's easier to think about this using "data redaction" example
where you specify which columns can be replicated under what condition.
Obviously, that's "orthogonal" in the sense that we specify column list
for a row filer condition, not row filter for a column. But in principle
it's the same thing, just different grammar.

And in that case it makes perfect sense that you don't blindly combine
the column lists from all publications, because that'd defeat the whole
point of filtering columns based on row filters.

Imagine have a table with customers from different regions, and you want
to replicate the data somewhere else, but for some reason you can only
replicate details for one particular region, and subset of columns for
everyone else. So you'd do something like this:

CREATE PUBLICATION p1 FOR TABLE customers (... all columns ...)
WHERE region = 'USA';

CREATE PUBLICATION p1 FOR TABLE customers (... subset of columns ...)
WHERE region != 'USA';

I think ignoring the row filters and just merging the column lists makes
no sense for this use case.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2022-05-01 21:57:16 Re: bogus: logical replication rows/cols combinations
Previous Message Tom Lane 2022-05-01 21:12:26 Re: Libpq single-row mode slowness