From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Subject: | Re: bogus: logical replication rows/cols combinations |
Date: | 2022-04-25 22:30:21 |
Message-ID: | 006d5ffc-1e7a-5cf2-c507-88eee6e0c14d@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 4/25/22 17:48, Alvaro Herrera wrote:
> I just noticed that publishing tables on multiple publications with
> different row filters and column lists has somewhat surprising behavior.
> To wit: if a column is published in any row-filtered publication, then
> the values for that column are sent to the subscriber even for rows that
> don't match the row filter, as long as the row matches the row filter
> for any other publication, even if that other publication doesn't
> include the column.
>
> Here's an example.
>
> Publisher:
>
> create table uno (a int primary key, b int, c int);
> create publication uno for table uno (a, b) where (a > 0);
> create publication dos for table uno (a, c) where (a < 0);
>
> Here, we specify: publish columns a,b for rows with positive a, and
> publish columns a,c for rows with negative a.
>
> What happened next will surprise you! Well, maybe not. On subscriber:
>
> create table uno (a int primary key, b int, c int);
> create subscription sub_uno connection 'port=55432 dbname=alvherre' publication uno,dos;
>
> Publisher:
> insert into uno values (1, 2, 3), (-1, 3, 4);
>
> Publication 'uno' only has columns a and b, so row with a=1 should not
> have value c=3. And publication 'dos' only has columns a and c, so row
> with a=-1 should not have value b=3. But, on subscriber:
>
> table uno;
> a │ b │ c
> ────┼───┼───
> 1 │ 2 │ 3
> -1 │ 3 │ 4
>
> q.e.d.
>
> I think results from a too simplistic view on how to mix multiple
> publications with row filters and column lists. IIRC we are saying "if
> column X appears in *any* publication, then the value is published",
> period, and don't stop to evaluate the row filter corresponding to each
> of those publications.
>
Right.
> The desired result on subscriber is:
>
> table uno;
> a │ b │ c
> ────┼───┼───
> 1 │ 2 │
> -1 │ │ 4
>
>
> Thoughts?
>
I'm not quite sure which of the two behaviors is more "desirable". In a
way, it's somewhat similar to publish_as_relid, which is also calculated
not considering which of the row filters match?
But maybe you're right and it should behave the way you propose ... the
example I have in mind is a use case replicating table with two types of
rows - sensitive and non-sensitive. For sensitive, we replicate only
some of the columns, for non-sensitive we replicate everything. Which
could be implemented as two publications
create publication sensitive_rows
for table t (a, b) where (is_sensitive);
create publication non_sensitive_rows
for table t where (not is_sensitive);
But the way it's implemented now, we'll always replicate all columns,
because the second publication has no column list.
Changing this to behave the way you expect would be quite difficult,
because at the moment we build a single OR expression from all the row
filters. We'd have to keep the individual expressions, so that we can
build a column list for each of them (in order to ignore those that
don't match).
We'd have to remove various other optimizations - for example we can't
just discard row filters if we found "no_filter" publication. Or more
precisely, we'd have to consider column lists too.
In other words, we'd have to merge pgoutput_column_list_init into
pgoutput_row_filter_init, and then modify pgoutput_row_filter to
evaluate the row filters one by one, and build the column list.
I can take a stab at it, but it seems strange to not apply the same
logic to evaluation of publish_as_relid. I wonder what Amit thinks about
this, as he wrote the row filter stuff.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-04-25 22:44:18 | Re: Cryptohash OpenSSL error queue in FIPS enabled builds |
Previous Message | Daniel Gustafsson | 2022-04-25 22:07:32 | Re: Cryptohash OpenSSL error queue in FIPS enabled builds |