Re: bogus: logical replication rows/cols combinations

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: bogus: logical replication rows/cols combinations
Date: 2022-04-28 17:30:08
Message-ID: 765985d7-c464-2a73-6acf-268b341173aa@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/28/22 14:26, Peter Eisentraut wrote:
> On 27.04.22 12:33, Amit Kapila wrote:
>> Currently, when the subscription has multiple publications, we combine
>> the objects, and actions of those publications. It happens for
>> 'publish_via_partition_root', publication actions, tables, column
>> lists, or row filters. I think the whole design works on this idea
>> even the initial table sync. I think it might need a major change
>> (which I am not sure about at this stage) if we want to make the
>> initial sync also behave similar to what you are proposing.
>
> If one publication says "publish if insert" and another publication says
> "publish if update", then the combination of that is clearly "publish if
> insert or update".  Similarly, if one publication says "WHERE (foo)" and
> one says "WHERE (bar)", then the combination is "WHERE (foo OR bar)".
>
> But if one publication says "publish columns a and b if condition-X" and
> another publication says "publish columns a and c if not-condition-X",
> then the combination is clearly *not* "publish columns a, b, c if true".
>  That is not logical, in the literal sense of that word.
>
> I wonder how we handle the combination of
>
> pub1: publish=insert WHERE (foo)
> pub2: publish=update WHERE (bar)
>
> I think it would be incorrect if the combination is
>
> pub1, pub2: publish=insert,update WHERE (foo OR bar).

That's a good question, actually. No, we don't combine the publications
like this, the row filters are kept "per action". But the exact behavior
turns out to be rather confusing in this case.

(Note: This has nothing to do with column lists.)

Consider an example similar to what Alvaro posted earlier:

create table uno (a int primary key, b int, c int);

create publication uno for table uno where (a > 0)
with (publish='insert');

create publication dos for table uno where (a < 0)
with (publish='update');

And do this:

insert into uno values (1, 2, 3), (-1, 3, 4)

which on the subscriber produces just one row, because (a<0) replicates
only updates:

a | b | c
---+---+---
1 | 2 | 3
(1 row)

Now, let's update the (a<0) row.

update uno set a = 2 where a = -1;

It might seem reasonable to expect the updated row (2,3,4) to appear on
the subscriber, but no - that's not what happens. Because we have (a<0)
for UPDATE, and we evaluate this on the old row (matches) and new row
(does not match). And pgoutput_row_filter() decides the update needs to
be converted to DELETE, despite the old row was not replicated at all.

I'm not sure if pgoutput_row_filter() can even make reasonable decisions
with such configuration (combination of row filters, actions ...). But
it sure seems confusing, because if you just inserted the updated row,
it would get replicated.

Which brings me to a second problem, related to this one. Imagine you
create the subscription *after* inserting the two rows. In that case you
get this:

a | b | c
----+---+---
1 | 2 | 3
-1 | 3 | 4
(2 rows)

because tablesync.c ignores which actions is the publication (and thus
the rowfilter) defined for.

I think it's natural to expect that (INSERT + sync) and (sync + INSERT)
produce the same output on the subscriber.

I'm not sure we can actually make this perfectly sane with arbitrary
combinations of filters and actions. It would probably depend on whether
the actions are commutative, associative and stuff like that. But maybe
we can come up with restrictions that'd make this sane?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-04-28 19:35:24 Re: json_object returning jsonb reuslt different from returning json, returning text
Previous Message vignesh C 2022-04-28 17:07:22 Re: Handle infinite recursion in logical replication setup