Re: row filtering for logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-11-29 12:10:06
Message-ID: CAA4eK1LgUkQW3GtLYkVPvpbWQ6LU1YrR-ovb7qs613ZdoA-YJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 29, 2021 at 4:36 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Mon, Nov 29, 2021 at 3:41 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> > > ---- Publisher:
> > > INSERT INTO tbl1 VALUES (1,1);
> > > UPDATE tbl1 SET a = 2;
> > >
> > > Prior to the UPDATE above:
> > > On pub side, tbl1 contains (1,1).
> > > On sub side, tbl1 contains (1,1)
> > >
> > > After the above UPDATE:
> > > On pub side, tbl1 contains (2,1).
> > > On sub side, tbl1 contains (1,1), (2,1)
> > >
> > > So the UPDATE on the pub side has resulted in an INSERT of (2,1) on
> > > the sub side.
> > >
> > > This is because when (1,1) is UPDATEd to (2,1), it attempts to use the
> > > "insert" filter "(b<2)" to determine whether the old value had been
> > > inserted (published to subscriber), but finds there is no "b" value
> > > (because it only uses RI cols for UPDATE) and so has to assume the old
> > > tuple doesn't exist on the subscriber, hence the UPDATE ends up doing
> > > an INSERT.
> > > INow if the use of RI cols were enforced for the insert filter case,
> > > we'd properly know the answer as to whether the old row value had been
> > > published and it would have correctly performed an UPDATE instead of
> > > an INSERT in this case.
> > >
> >
> > I don't think it is a good idea to combine the row-filter from the
> > publication that publishes just 'insert' with the row-filter that
> > publishes 'updates'. We shouldn't apply the 'insert' filter for
> > 'update' and similarly for publication operations. We can combine the
> > filters when the published operations are the same. So, this means
> > that we might need to cache multiple row-filters but I think that is
> > better than having another restriction that publish operation 'insert'
> > should also honor RI columns restriction.
>
> I am just wondering that if we don't combine filter in the above case
> then what data we will send to the subscriber if the operation is
> "UPDATE tbl1 SET a = 2, b=3", so in this case, we will apply only the
> update filter i.e. a > 1 so as per that this will become the INSERT
> operation because the old row was not passing the filter.
>

If we want, I think for inserts (new row) we can consider the insert
filter as well but that makes it tricky to explain. I feel we can
change it later as well if there is a valid use case for this. What do
you think?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-11-29 12:20:34 Re: row filtering for logical replication
Previous Message Marcos Pegoraro 2021-11-29 12:03:06 Re: Commitfest 2021-11 Patch Triage - Part 1