Re: row filtering for logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-12-20 03:26:59
Message-ID: CAA4eK1JTn1NFnXbUFkLfdkwsUHKDA2MY9KiAxZsm=kgU-KgVwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 20, 2021 at 6:07 AM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> On Sat, Dec 18, 2021 at 1:33 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > >
> > > I think it's a concern, for such a basic example with only one row,
> > > getting unpredictable (and even wrong) replication results, depending
> > > upon the order of operations.
> > >
> >
> > I am not sure how we can deduce that. The results are based on current
> > and new values of row which is what I think we are expecting here.
> >
>
> In the two simple cases presented, the publisher ends up with the same
> single row (2,1) in both cases, but in one of the cases the subscriber
> ends up with an extra row (1,1) that the publisher doesn't have. So,
> in using a "filter", a new row has been published that the publisher
> doesn't have. I'm not so sure a user would be expecting that. Not to
> mention that if (1,1) is subsequently INSERTed on the publisher side,
> it will result in a duplicate key error on the publisher.
>

Personally, I feel users need to be careful in defining publications
and subscriptions, otherwise, there are various ways "duplicate key
error .." kind of issues can arise. Say, you different publications
which publish the same table, and then you have different
subscriptions on the subscriber which subscribe to those publications.

> > > Doesn't this problem result from allowing different WHERE clauses for
> > > different pubactions for the same table?
> > > My current thoughts are that this shouldn't be allowed, and also WHERE
> > > clauses for INSERTs should, like UPDATE and DELETE, be restricted to
> > > using only columns covered by the replica identity or primary key.
> > >
> >
> > Hmm, even if we do that one could have removed the insert row filter
> > by the time we are evaluating the update. So, we will get the same
> > result. I think the behavior in your example is as we expect as per
> > the specs defined by the patch and I don't see any problem, in this
> > case, w.r.t replication results. Let us see what others think on this?
> >
>
> Here I'm talking about the typical use-case of setting the
> row-filtering WHERE clause up-front and not changing it thereafter.
> I think that dynamically changing filters after INSERT/UPDATE/DELETE
> operations is not the typical use-case, and IMHO it's another thing
> entirely (could result in all kinds of unpredictable, random results).
>

Yeah, that's what I also wanted to say that but users need to
carefully define publications/subscriptions, otherwise, with up-front
definition also leads to unpredictable results as shared in the
explanation above. I feel Hou-San's latest email [1] explains the
current rules very well and maybe we should document them in some way
to avoid confusion.

> Personally I think it would make more sense to:
> 1) Disallow different WHERE clauses on the same table, for different pubactions.
> 2) If only INSERTs are being published, allow any column in the WHERE
> clause, otherwise (as for UPDATE and DELETE) restrict the referenced
> columns to be part of the replica identity or primary key.
>

We can restrict in some way like you are saying or we can even
restrict such that we "disallow specifying row filters unless
pubactions have all the dml operations and allow row filter to have
columns that are part of replica identity or primary key". I feel it
is better to provide flexibility as the current patch does and
document it to make users aware of the kind of problems that can arise
with the wrong usage.

[1] - https://www.postgresql.org/message-id/OS0PR01MB57168F4384D50656A4FC2DC5947B9%40OS0PR01MB5716.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message wangw.fnst@fujitsu.com 2021-12-20 03:46:13 RE: Confused comment about drop replica identity index
Previous Message tanghy.fnst@fujitsu.com 2021-12-20 03:23:41 RE: row filtering for logical replication