Re: row filtering for logical replication

From: "Euler Taveira" <euler(at)eulerto(dot)com>
To: "Alvaro Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: "Peter Smith" <smithpb2250(at)gmail(dot)com>, "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>, "Rahila Syed" <rahilasyed90(at)gmail(dot)com>, "Peter Eisentraut" <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, "Michael Paquier" <michael(at)paquier(dot)xyz>, "David Steele" <david(at)pgmasters(dot)net>, "Craig Ringer" <craig(at)2ndquadrant(dot)com>, "Tomas Vondra" <tomas(dot)vondra(at)2ndquadrant(dot)com>, "Amit Langote" <amitlangote09(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-07-14 00:57:44
Message-ID: 48e13621-10f7-49ef-bcbe-ed530195efd2@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 13, 2021, at 6:06 PM, Alvaro Herrera wrote:
> 1. if you use REPLICA IDENTITY FULL, then the expressions would work
> even if they use any other column with DELETE. Maybe it would be
> reasonable to test for this in the code and raise an error if the
> expression requires a column that's not part of the replica identity.
> (But that could be relaxed if the publication does not publish
> updates/deletes.)
I thought about it but came to the conclusion that it doesn't worth it. Even
with REPLICA IDENTITY FULL expression evaluates to false if the column allows
NULL values. Besides that REPLICA IDENTITY is changed via another DDL (ALTER
TABLE) and you have to make sure you don't allow changing REPLICA IDENTITY
because some row filter uses the column you want to remove from it.

> 2. For UPDATE, does the expression apply to the old tuple or to the new
> tuple? You say it's the new tuple, but from the user point of view I
> think it would make more sense that it would apply to the old tuple.
> (Of course, if you're thinking that the R.I. is the PK and the PK is
> never changed, then you don't really care which one it is, but I bet
> that some people would not like that assumption.)
New tuple. The main reason is that new tuple is always there for UPDATEs.
Hence, row filter might succeed even if the row filter contains a column that
is not part of PK or REPLICA IDENTITY. pglogical also chooses to use new tuple
when it is available (e.g. for INSERT and UPDATE). If you don't like this
approach we can (a) create a new publication option to choose between old tuple
and new tuple for UPDATEs or (b) qualify columns using a special reference
(such as NEW.id or OLD.foo). Both options can provide flexibility but (a) is
simpler.

> I think it is sensible that it's the old tuple that is matched, not the
> new; consider what happens if you change the PK in the update and the
> replica already has that tuple. If you match on the new tuple and it
> doesn't match the expression (so you filter out the update), but the old
> tuple does match the expression, then the replica will retain the
> mismatching tuple forever.
>
> 3. You say that a NULL value in any of those columns causes the
> expression to become false and thus the tuple is not published. This
> seems pretty unfriendly, but maybe it would be useful to have examples
> of the behavior. Does ExecInitCheck() handle things in the other way,
> and if so does using a similar trick give more useful behavior?
ExecInitCheck() is designed for CHECK constraints and SQL standard requires
taht NULL constraint conditions are not treated as errors. This feature uses a
WHERE clause and behaves like it. I mean, a NULL result does not return the
row. See ExecQual().

--
Euler Taveira
EDB https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2021-07-14 01:15:27 Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Previous Message David Rowley 2021-07-14 00:46:44 Re: enable_resultcache confusion