From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | vignesh C <vignesh21(at)gmail(dot)com> |
Cc: | Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nitin Motiani <nitinmotiani(at)google(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: long-standing data loss bug in initial sync of logical replication |
Date: | 2024-08-20 10:40:22 |
Message-ID: | CAA4eK1+1mQ6DY+Ext6MZvBBzko9SgQO+Ve2G5jnnfjH6RfWK6A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Aug 15, 2024 at 9:31 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Thu, 8 Aug 2024 at 16:24, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
> >
> > On Wed, 31 Jul 2024 at 11:17, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
> > >
> >
> > Created a patch for distributing invalidations.
> > Here we collect the invalidation messages for the current transaction
> > and distribute it to all the inprogress transactions, whenever we are
> > distributing the snapshots..Thoughts?
>
> Since we are applying invalidations to all in-progress transactions,
> the publisher will only replicate half of the transaction data up to
> the point of invalidation, while the remaining half will not be
> replicated.
> Ex:
> Session1:
> BEGIN;
> INSERT INTO tab_conc VALUES (1);
>
> Session2:
> ALTER PUBLICATION regress_pub1 DROP TABLE tab_conc;
>
> Session1:
> INSERT INTO tab_conc VALUES (2);
> INSERT INTO tab_conc VALUES (3);
> COMMIT;
>
> After the above the subscriber data looks like:
> postgres=# select * from tab_conc ;
> a
> ---
> 1
> (1 row)
>
> You can reproduce the issue using the attached test.
> I'm not sure if this behavior is ok. At present, we’ve replicated the
> first record within the same transaction, but the second and third
> records are being skipped.
>
This can happen even without a concurrent DDL if some of the tables in
the database are part of the publication and others are not. In such a
case inserts for publicized tables will be replicated but other
inserts won't. Sending the partial data of the transaction isn't a
problem to me. Do you have any other concerns that I am missing?
> Would it be better to apply invalidations
> after the transaction is underway?
>
But that won't fix the problem reported by Sawada-san in an email [1].
BTW, we should do some performance testing by having a mix of DML and
DDLs to see the performance impact of this patch.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Zhijie Hou (Fujitsu) | 2024-08-20 11:15:24 | RE: Conflict detection and logging in logical replication |
Previous Message | Peter Eisentraut | 2024-08-20 10:38:25 | Re: Virtual generated columns |