Re: long-standing data loss bug in initial sync of logical replication

From: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Nitin Motiani <nitinmotiani(at)google(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: long-standing data loss bug in initial sync of logical replication
Date: 2024-10-03 05:42:47
Message-ID: CANhcyEWEXL3rxvKH9-Xtx-DgGX0D62EktHpW+nG+MSSaMVUVig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > I have addressed the comment for 0002 patch and attached the patches.
> > Also, I have moved the tests in the 0002 to 0001 patch.
>
> Thanks for updating the patch. 0002 patch seems to remove cache invalidations
> from publication_invalidation_cb(). Related with it, I found an issue and had a concern.
>
> 1.
> The replication continues even after ALTER PUBLICATION RENAME is executed.
> For example - assuming that a subscriber subscribes only "pub":
>
> ```
> pub=# INSERT INTO tab values (1);
> INSERT 0 1
> pub=# ALTER PUBLICATION pub RENAME TO pub1;
> ALTER PUBLICATION
> pub=# INSERT INTO tab values (2);
> INSERT 0 1
>
> sub=# SELECT * FROM tab ; -- (2) should not be replicated however...
> a
> ---
> 1
> 2
> (2 rows)
> ```
>
> This happens because 1) ALTER PUBLICATION RENAME statement won't be invalidate the
> relation cache, and 2) publications are reloaded only when invalid RelationSyncEntry
> is found. In given example, the first INSERT creates the valid cache and second
> INSERT reuses it. Therefore, the pubname-check is skipped.
>
> For now, the actual renaming is done at AlterObjectRename_internal(), a generic
> function. I think we must implement a dedecated function to publication and must
> invalidate relcaches there.
>
> 2.
> Similarly with above, the relcache won't be invalidated when ALTER PUBLICATION
> OWNER TO is executed. This means that privilage checks may be ignored if the entry
> is still valid. Not sure, but is there a possibility this causes an inconsistency?
>

Hi Kuroda-san,

Thanks for testing the patch. I have fixed the comments and attached
the updated patch.
I have added a callback function rel_sync_cache_publicationrel_cb().
This callback invalidates the cache of tables in a particular
publication.
This callback is called when there is some modification in
pg_publication catalog.

I have tested the two cases 'ALTER PUBLICATION ... RENAME TO ...' and
'ALTER PUBLICATION ... OWNER TO ...' and debugged it. The newly
added callback is called and it invalidates the cache of tables
present in that particular publication.
I have also added a test related to 'ALTER PUBLICATION ... RENAME TO
...' to 0001 patch.

Thanks and Regards,
Shlok Kyal

Attachment Content-Type Size
v11-0001-Distribute-invalidatons-if-change-in-catalog-tab.patch application/octet-stream 13.9 KB
v11-0002-Selective-Invalidation-of-Cache.patch application/octet-stream 14.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shlok Kyal 2024-10-03 05:53:04 Re: long-standing data loss bug in initial sync of logical replication
Previous Message Amit Langote 2024-10-03 05:24:23 Re: Address the -Wuse-after-free warning in ATExecAttachPartition()