Re: long-standing data loss bug in initial sync of logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Nitin Motiani <nitinmotiani(at)google(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: long-standing data loss bug in initial sync of logical replication
Date: 2024-09-05 10:34:16
Message-ID: CAA4eK1+cUsnAL6jF4Lny5KiBz5iZv1S+38=zHZHskgo_ZRYMMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 2, 2024 at 9:19 PM Nitin Motiani <nitinmotiani(at)google(dot)com> wrote:
>
> I think that the partial data replication for one table is a bigger
> issue than the case of data being sent for a subset of the tables in
> the transaction. This can lead to inconsistent data if the same row is
> updated multiple times or deleted in the same transaction. In such a
> case if only the partial updates from the transaction are sent to the
> subscriber, it might end up with the data which was never visible on
> the publisher side.
>
> Here is an example I tried with the patch v8-001 :
>
> I created following 2 tables on the publisher and the subscriber :
>
> CREATE TABLE delete_test(id int primary key, name varchar(100));
> CREATE TABLE update_test(id int primary key, name varchar(100));
>
> I added both the tables to the publication p on the publisher and
> created a subscription s on the subscriber.
>
> I run 2 sessions on the publisher and do the following :
>
> Session 1 :
> BEGIN;
> INSERT INTO delete_test VALUES(0, 'Nitin');
>
> Session 2 :
> ALTER PUBLICATION p DROP TABLE delete_test;
>
> Session 1 :
> DELETE FROM delete_test WHERE id=0;
> COMMIT;
>
> After the commit there should be no new row created on the publisher.
> But because the partial data was replicated, this is what the select
> on the subscriber shows :
>
> SELECT * FROM delete_test;
> id | name
> ----+-----------
> 0 | Nitin
> (1 row)
>
> I don't think the above is a common use case. But this is still an
> issue because the subscriber has the data which never existed on the
> publisher.
>

I don't think that is the correct conclusion because the user has
intentionally avoided sending part of the transaction changes. This
can happen in various ways without the patch as well. For example, if
the user has performed the ALTER in the same transaction.

Publisher:
=========
BEGIN
postgres=*# Insert into delete_test values(0, 'Nitin');
INSERT 0 1
postgres=*# Alter Publication pub1 drop table delete_test;
ALTER PUBLICATION
postgres=*# Delete from delete_test where id=0;
DELETE 1
postgres=*# commit;
COMMIT
postgres=# select * from delete_test;
id | name
----+------
(0 rows)

Subscriber:
=========
postgres=# select * from delete_test;
id | name
----+-------
0 | Nitin
(1 row)

This can also happen when the user has published only 'inserts' but
not 'updates' or 'deletes'.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2024-09-05 11:32:34 Re: optimizing pg_upgrade's once-in-each-database steps
Previous Message Amit Langote 2024-09-05 09:55:47 Re: generic plans and "initial" pruning