Re: long-standing data loss bug in initial sync of logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nitin Motiani <nitinmotiani(at)google(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: long-standing data loss bug in initial sync of logical replication
Date: 2024-08-20 10:40:22
Message-ID: CAA4eK1+1mQ6DY+Ext6MZvBBzko9SgQO+Ve2G5jnnfjH6RfWK6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 15, 2024 at 9:31 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Thu, 8 Aug 2024 at 16:24, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
> >
> > On Wed, 31 Jul 2024 at 11:17, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
> > >
> >
> > Created a patch for distributing invalidations.
> > Here we collect the invalidation messages for the current transaction
> > and distribute it to all the inprogress transactions, whenever we are
> > distributing the snapshots..Thoughts?
>
> Since we are applying invalidations to all in-progress transactions,
> the publisher will only replicate half of the transaction data up to
> the point of invalidation, while the remaining half will not be
> replicated.
> Ex:
> Session1:
> BEGIN;
> INSERT INTO tab_conc VALUES (1);
>
> Session2:
> ALTER PUBLICATION regress_pub1 DROP TABLE tab_conc;
>
> Session1:
> INSERT INTO tab_conc VALUES (2);
> INSERT INTO tab_conc VALUES (3);
> COMMIT;
>
> After the above the subscriber data looks like:
> postgres=# select * from tab_conc ;
> a
> ---
> 1
> (1 row)
>
> You can reproduce the issue using the attached test.
> I'm not sure if this behavior is ok. At present, we’ve replicated the
> first record within the same transaction, but the second and third
> records are being skipped.
>

This can happen even without a concurrent DDL if some of the tables in
the database are part of the publication and others are not. In such a
case inserts for publicized tables will be replicated but other
inserts won't. Sending the partial data of the transaction isn't a
problem to me. Do you have any other concerns that I am missing?

> Would it be better to apply invalidations
> after the transaction is underway?
>

But that won't fix the problem reported by Sawada-san in an email [1].

BTW, we should do some performance testing by having a mix of DML and
DDLs to see the performance impact of this patch.

[1] - https://www.postgresql.org/message-id/CAD21AoAenVqiMjpN-PvGHL1N9DWnHSq673bfgr6phmBUzx=kLQ@mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2024-08-20 11:15:24 RE: Conflict detection and logging in logical replication
Previous Message Peter Eisentraut 2024-08-20 10:38:25 Re: Virtual generated columns