Re: row filtering for logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-07-16 05:49:53
Message-ID: CAA4eK1Jhp9=ZZZ2=ahpWbTiqrZcuu61FTD_moeq4AgQ4Z2Y4Qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 16, 2021 at 10:11 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Fri, Jul 16, 2021 at 8:57 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Jul 14, 2021 at 4:30 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Jul 14, 2021 at 3:58 PM Tomas Vondra
> > > <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> > > >
> > > > Is there some reasonable rule which of the old/new tuples (or both) to
> > > > use for the WHERE condition? Or maybe it'd be handy to allow referencing
> > > > OLD/NEW as in triggers?
> > >
> > > I think for insert we are only allowing those rows to replicate which
> > > are matching filter conditions, so if we updating any row then also we
> > > should maintain that sanity right? That means at least on the NEW rows
> > > we should apply the filter, IMHO. Said that, now if there is any row
> > > inserted which were satisfying the filter and replicated, if we update
> > > it with the new value which is not satisfying the filter then it will
> > > not be replicated, I think that makes sense because if an insert is
> > > not sending any row to a replica which is not satisfying the filter
> > > then why update has to do that, right?
> > >
> >
> > There is another theory in this regard which is what if the old row
> > (created by the previous insert) is not sent to the subscriber as that
> > didn't match the filter but after the update, we decide to send it
> > because the updated row (new row) matches the filter condition. In
> > this case, I think it will generate an update conflict on the
> > subscriber as the old row won't be present. As of now, we just skip
> > the update but in the future, we might have some conflict handling
> > there. If this is true then even if the new row matches the filter,
> > there is no guarantee that it will be applied on the subscriber-side
> > unless the old row also matches the filter.
>
> Yeah, it's a valid point.
>
> Sure, there could be a
> > case where the user might have changed the filter between insert and
> > update but maybe we can have a separate way to deal with such cases if
> > required like providing some provision where the user can specify
> > whether it would like to match old/new row in updates?
>
> Yeah, I think the best way is that users should get an option whether
> they want to apply the filter on the old row or on the new row, or
> both, in fact, they should be able to apply the different filters on
> old and new rows.
>

I am not so sure about different filters for old and new rows but it
makes sense to by default apply the filter to both old and new rows.
Then also provide a way for user to specify if the filter can be
specified to just old or new row.

> I have one more thought in mind: currently, we are
> providing a filter for the publication table, doesn't it make sense to
> provide filters for operations of the publication table? I mean the
> different filters for Insert, delete, and the old row of update and
> the new row of the update.
>

Hmm, I think this sounds a bit of a stretch but if there is any field
use case then we can consider this in the future.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2021-07-16 06:04:03 Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Previous Message Laurenz Albe 2021-07-16 05:46:31 Re: Improve documentation for pg_upgrade, standbys and rsync