Re: row filtering for logical replication

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-11-18 05:32:18
Message-ID: CAHut+Pt+bOqOXnPMOP7fRWTb0qmN52w9Whbo9oqpko3ubDZd1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 15, 2021 at 9:31 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Nov 10, 2021 at 12:36 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > On Mon, Nov 8, 2021 at 5:53 PM houzj(dot)fnst(at)fujitsu(dot)com
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > 3) v37-0005
> > >
> > > - no parse nodes of any kind other than Var, OpExpr, Const, BoolExpr, FuncExpr
> > >
> > > I think there could be other node type which can also be considered as simple
> > > expression, for exmaple T_NullIfExpr.
> >
> > The current walker restrictions are from a previously agreed decision
> > by Amit/Tomas [1] and from an earlier suggestion from Andres [2] to
> > keep everything very simple for a first version.
> >
> > Yes, you are right, there might be some additional node types that
> > might be fine, but at this time I don't want to add anything different
> > without getting their approval to do so. Anyway, additions like this
> > are all candidates for a future version of this row-filter feature.
> >
>
> I think we can consider T_NullIfExpr unless you see any problem with the same.

Added in v40 [1]

> Few comments on the latest set of patches (v39*)
> =======================================
...
> 0003*
> 3. In pgoutput_row_filter(), the patch is finding pub_relid when it
> should already be there in RelationSyncEntry->publish_as_relid found
> during get_rel_sync_entry call. Is there a reason to do this work
> again?

Fixed in v40 [1]

>
> 4. I think we should add some comments in pgoutput_row_filter() as to
> why we are caching the row_filter here instead of
> get_rel_sync_entry()? That has been discussed multiple times so it is
> better to capture that in comments.

Added comment in v40 [1]

>
> 5. Why do you need a separate variable rowfilter_valid to indicate
> whether a valid row filter exists? Why exprstate is not sufficient?
> Can you update comments to indicate why we need this variable
> separately?

I have improved the (existing) comment in v40 [1].

>
> 0004*
> 6. In rowfilter_expr_checker(), the expression tree is traversed
> twice, can't we traverse it once to detect all non-allowed stuff? It
> can be sometimes costly to traverse the tree multiple times especially
> when the expression is complex and it doesn't seem acceptable to do so
> unless there is some genuine reason for the same.

I kind of doubt there would be any perceptible difference for 2
traverses instead of 1 because:
a) filters are limited to simple expressions. Yes, a large boolean
expression is possible but I don't think it is likely.
b) the validation part is mostly a one-time execution only when the
filter is created or changed.

Anyway, I am happy to try to refactor the logic to a single traversal
as suggested, but I'd like to combine those "validation" patches
(v40-0005, v40-0006) first, so I can combine their walker logic. Is it
OK?

>
> 7.
> +static void
> +rowfilter_expr_checker(Publication *pub, Node *rfnode, Relation rel)
>
> Keep the rel argument before whereclause as that makes the function
> signature better.

Fixed in v40 [1]

-----
[1] https://www.postgresql.org/message-id/CAHut%2BPv-D4rQseRO_OzfEz2dQsTKEnKjBCET9Z-iJppyT1XNMQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2021-11-18 05:32:24 Re: Slow client can delay replication despite max_standby_streaming_delay set
Previous Message Bharath Rupireddy 2021-11-18 05:18:12 Re: pg_get_publication_tables() output duplicate relid