Re: long-standing data loss bug in initial sync of logical replication

From: Nitin Motiani <nitinmotiani(at)google(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: long-standing data loss bug in initial sync of logical replication
Date: 2024-09-10 08:50:50
Message-ID: CAH5HC94xErGnFnnrs+=nJtCOSsXZiLBa_1YenCgiOLf=f=nh8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 5, 2024 at 4:04 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Sep 2, 2024 at 9:19 PM Nitin Motiani <nitinmotiani(at)google(dot)com> wrote:
> >
> > I think that the partial data replication for one table is a bigger
> > issue than the case of data being sent for a subset of the tables in
> > the transaction. This can lead to inconsistent data if the same row is
> > updated multiple times or deleted in the same transaction. In such a
> > case if only the partial updates from the transaction are sent to the
> > subscriber, it might end up with the data which was never visible on
> > the publisher side.
> >
> > Here is an example I tried with the patch v8-001 :
> >
> > I created following 2 tables on the publisher and the subscriber :
> >
> > CREATE TABLE delete_test(id int primary key, name varchar(100));
> > CREATE TABLE update_test(id int primary key, name varchar(100));
> >
> > I added both the tables to the publication p on the publisher and
> > created a subscription s on the subscriber.
> >
> > I run 2 sessions on the publisher and do the following :
> >
> > Session 1 :
> > BEGIN;
> > INSERT INTO delete_test VALUES(0, 'Nitin');
> >
> > Session 2 :
> > ALTER PUBLICATION p DROP TABLE delete_test;
> >
> > Session 1 :
> > DELETE FROM delete_test WHERE id=0;
> > COMMIT;
> >
> > After the commit there should be no new row created on the publisher.
> > But because the partial data was replicated, this is what the select
> > on the subscriber shows :
> >
> > SELECT * FROM delete_test;
> > id | name
> > ----+-----------
> > 0 | Nitin
> > (1 row)
> >
> > I don't think the above is a common use case. But this is still an
> > issue because the subscriber has the data which never existed on the
> > publisher.
> >
>
> I don't think that is the correct conclusion because the user has
> intentionally avoided sending part of the transaction changes. This
> can happen in various ways without the patch as well. For example, if
> the user has performed the ALTER in the same transaction.
>
> Publisher:
> =========
> BEGIN
> postgres=*# Insert into delete_test values(0, 'Nitin');
> INSERT 0 1
> postgres=*# Alter Publication pub1 drop table delete_test;
> ALTER PUBLICATION
> postgres=*# Delete from delete_test where id=0;
> DELETE 1
> postgres=*# commit;
> COMMIT
> postgres=# select * from delete_test;
> id | name
> ----+------
> (0 rows)
>
> Subscriber:
> =========
> postgres=# select * from delete_test;
> id | name
> ----+-------
> 0 | Nitin
> (1 row)
>
> This can also happen when the user has published only 'inserts' but
> not 'updates' or 'deletes'.
>

Thanks for the clarification. I didn't think of this case. The change
seems fine if this can already happen.

Thanks & Regards
Nitin Motiani
Google

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2024-09-10 08:51:39 Re: First draft of PG 17 release notes
Previous Message Amit Kapila 2024-09-10 08:46:27 Re: Invalid Assert while validating REPLICA IDENTITY?