Re: long-standing data loss bug in initial sync of logical replication

From: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Nitin Motiani <nitinmotiani(at)google(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: long-standing data loss bug in initial sync of logical replication
Date: 2024-10-07 11:15:03
Message-ID: CANhcyEUcmzFRcHre_1G7OiuKs98-9qbd8UQsfJHf3qE6TB8nrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 4 Oct 2024 at 12:52, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
>
> Hi Kuroda-san,
>
> Thanks for reviewing the patch.
> >
> > 1.
> > I feel the name of SnapBuildDistributeNewCatalogSnapshot() should be updated because it
> > distributes two objects: catalog snapshot and invalidation messages. Do you have good one
> > in your mind? I considered "SnapBuildDistributeNewCatalogSnapshotAndInValidations" or
> > "SnapBuildDistributeItems" but seems not good :-(.
>
> I have renamed the function to 'SnapBuildDistributeSnapshotAndInval'. Thoughts?
>
> > 2.
> > Hmm, still, it is overengineering for me to add a new type of invalidation message
> > only for the publication. According to the ExecRenameStmt() we can implement an
> > arbitrary rename function like RenameConstraint() and RenameDatabase().
> > Regaring the ALTER PUBLICATION OWNER TO, I feel adding CacheInvalidateRelcacheAll()
> > and InvalidatePublicationRels() is enough.
>
> I agree with you.
>
> >
> > I attached a PoC which implements above. It could pass tests on my env. Could you
> > please see it tell me how you think?
>
> I have tested the POC and it is working as expected. The changes look
> fine to me. I have created a patch for the same.
> Currently, we are passing 'PUBLICATION_PART_ALL' as an argument to
> function 'GetPublicationRelations' and
> 'GetAllSchemaPublicationRelations'. Need to check if we can use
> 'PUBLICATION_PART_ROOT' or 'PUBLICATION_PART_LEAF' depending on the
> 'publish_via_partition_root' option. Will test and address this in the
> next version of the patch. For now, I have added a TODO.

I have tested this part. I observed that ,whenever we insert data in a
partition table, the function 'get_rel_sync_entry' is called and a
hash entry is created for the corresponding leaf node relid. So I feel
while invalidating here we can specify 'PUBLICATION_PART_LEAF' . I
have made the corresponding changes 0002 patch.

I have also modified the tests in 0001 patch. These changes are only
related to syntax of writing tests.

Thanks and Regards,
Shlok Kyal

Attachment Content-Type Size
v13-0002-Selective-Invalidation-of-Cache.patch application/octet-stream 7.2 KB
v13-0001-Distribute-invalidatons-if-change-in-catalog-tab.patch application/octet-stream 13.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Phil Eaton 2024-10-07 12:17:49 Re: Add minimal C example and SQL registration example for custom table access methods.
Previous Message vignesh C 2024-10-07 10:33:10 Re: Make default subscription streaming option as Parallel