Re: long-standing data loss bug in initial sync of logical replication

From: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Nitin Motiani <nitinmotiani(at)google(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: long-standing data loss bug in initial sync of logical replication
Date: 2024-08-09 11:20:31
Message-ID: CANhcyEVtnmew_DU4bP=+=GVi2_8-B6tnkO=qz3hPMgx422u+8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 8 Aug 2024 at 16:24, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
>
> On Wed, 31 Jul 2024 at 11:17, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
> >
> > On Wed, 31 Jul 2024 at 09:36, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Jul 31, 2024 at 3:27 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Wed, Jul 24, 2024 at 9:53 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Wed, Jul 17, 2024 at 5:25 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > On Wed, 17 Jul 2024 at 11:54, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > > > >
> > > > > > > On Tue, Jul 16, 2024 at 6:54 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > > > > >
> > > > > > > BTW, I noticed that we don't take any table-level locks for Create
> > > > > > > Publication .. For ALL TABLES (and Drop Publication). Can that create
> > > > > > > a similar problem? I haven't tested so not sure but even if there is a
> > > > > > > problem for the Create case, it should lead to some ERROR like missing
> > > > > > > publication.
> > > > > >
> > > > > > I tested these scenarios, and as you expected, it throws an error for
> > > > > > the create publication case:
> > > > > > 2024-07-17 14:50:01.145 IST [481526] 481526 ERROR: could not receive
> > > > > > data from WAL stream: ERROR: publication "pub1" does not exist
> > > > > > CONTEXT: slot "sub1", output plugin "pgoutput", in the change
> > > > > > callback, associated LSN 0/1510CD8
> > > > > > 2024-07-17 14:50:01.147 IST [481450] 481450 LOG: background worker
> > > > > > "logical replication apply worker" (PID 481526) exited with exit code
> > > > > > 1
> > > > > >
> > > > > > The steps for this process are as follows:
> > > > > > 1) Create tables in both the publisher and subscriber.
> > > > > > 2) On the publisher: Create a replication slot.
> > > > > > 3) On the subscriber: Create a subscription using the slot created by
> > > > > > the publisher.
> > > > > > 4) On the publisher:
> > > > > > 4.a) Session 1: BEGIN; INSERT INTO T1;
> > > > > > 4.b) Session 2: CREATE PUBLICATION FOR ALL TABLES
> > > > > > 4.c) Session 1: COMMIT;
> > > > > >
> > > > > > Since we are throwing out a "publication does not exist" error, there
> > > > > > is no inconsistency issue here.
> > > > > >
> > > > > > However, an issue persists with DROP ALL TABLES publication, where
> > > > > > data continues to replicate even after the publication is dropped.
> > > > > > This happens because the open transaction consumes the invalidation,
> > > > > > causing the publications to be revalidated using old snapshot. As a
> > > > > > result, both the open transactions and the subsequent transactions are
> > > > > > getting replicated.
> > > > > >
> > > > > > We can reproduce this issue by following these steps in a logical
> > > > > > replication setup with an "ALL TABLES" publication:
> > > > > > On the publisher:
> > > > > > Session 1: BEGIN; INSERT INTO T1 VALUES (val1);
> > > > > > In another session on the publisher:
> > > > > > Session 2: DROP PUBLICATION
> > > > > > Back in Session 1 on the publisher:
> > > > > > COMMIT;
> > > > > > Finally, in Session 1 on the publisher:
> > > > > > INSERT INTO T1 VALUES (val2);
> > > > > >
> > > > > > Even after dropping the publication, both val1 and val2 are still
> > > > > > being replicated to the subscriber. This means that both the
> > > > > > in-progress concurrent transaction and the subsequent transactions are
> > > > > > being replicated.
> > > > > >
> > > > > > I don't think locking all tables is a viable solution in this case, as
> > > > > > it would require asking the user to refrain from performing any
> > > > > > operations on any of the tables in the database while creating a
> > > > > > publication.
> > > > > >
> > > > >
> > > > > Indeed, locking all tables in the database to prevent concurrent DMLs
> > > > > for this scenario also looks odd to me. The other alternative
> > > > > previously suggested by Andres is to distribute catalog modifying
> > > > > transactions to all concurrent in-progress transactions [1] but as
> > > > > mentioned this could add an overhead. One possibility to reduce
> > > > > overhead is that we selectively distribute invalidations for
> > > > > catalogs-related publications but I haven't analyzed the feasibility.
> > > > >
> > > > > We need more opinions to decide here, so let me summarize the problem
> > > > > and solutions discussed. As explained with an example in an email [1],
> > > > > the problem related to logical decoding is that it doesn't process
> > > > > invalidations corresponding to DDLs for the already in-progress
> > > > > transactions. We discussed preventing DMLs in the first place when
> > > > > concurrent DDLs like ALTER PUBLICATION ... ADD TABLE ... are in
> > > > > progress. The solution discussed was to acquire
> > > > > ShareUpdateExclusiveLock for all the tables being added via such
> > > > > commands. Further analysis revealed that the same handling is required
> > > > > for ALTER PUBLICATION ... ADD TABLES IN SCHEMA which means locking all
> > > > > the tables in the specified schemas. Then DROP PUBLICATION also seems
> > > > > to have similar symptoms which means in the worst case (where
> > > > > publication is for ALL TABLES) we have to lock all the tables in the
> > > > > database. We are not sure if that is good so the other alternative we
> > > > > can pursue is to distribute invalidations in logical decoding
> > > > > infrastructure [1] which has its downsides.
> > > > >
> > > > > Thoughts?
> > > >
> > > > Thank you for summarizing the problem and solutions!
> > > >
> > > > I think it's worth trying the idea of distributing invalidation
> > > > messages, and we will see if there could be overheads or any further
> > > > obstacles. IIUC this approach would resolve another issue we discussed
> > > > before too[1].
> > > >
> > >
> > > Yes, and we also discussed having a similar solution at the time when
> > > that problem was reported. So, it is clear that even though locking
> > > tables can work for commands alter ALTER PUBLICATION ... ADD TABLE
> > > ..., we need a solution for distributing invalidations to the
> > > in-progress transactions during logical decoding for other cases as
> > > reported by you previously.
> > >
> > > Thanks for looking into this.
> > >
> >
> > Thanks, I am working on to implement a solution for distributing
> > invalidations. Will share a patch for the same.
>
> Created a patch for distributing invalidations.
> Here we collect the invalidation messages for the current transaction
> and distribute it to all the inprogress transactions, whenever we are
> distributing the snapshots..Thoughts?

In the v7 patch, I am looping through the reorder buffer of the
current committed transaction and storing all invalidation messages in
a list. Then I am distributing those invalidations.
But I found that for a transaction we already store all the
invalidation messages (see [1]). So we don't need to loop through the
reorder buffer and store the invalidations.

I have modified the patch accordingly and attached the same.

[1]: https://github.com/postgres/postgres/blob/7da1bdc2c2f17038f2ae1900be90a0d7b5e361e0/src/include/replication/reorderbuffer.h#L384

Thanks and Regards,
Shlok Kyal

Attachment Content-Type Size
v8-0001-Distribute-invalidatons-if-change-in-catalog-tabl.patch application/octet-stream 9.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2024-08-09 11:41:26 Re: Remaining dependency on setlocale()
Previous Message Junwang Zhao 2024-08-09 10:19:11 Re: [Patch] remove duplicated smgrclose