From: | Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> |
---|---|
To: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Nitin Motiani <nitinmotiani(at)google(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Subject: | Re: long-standing data loss bug in initial sync of logical replication |
Date: | 2024-09-27 11:24:46 |
Message-ID: | CANhcyEVVtR4HUTUXDs5Ybg_v+YYVtw95ZRfzAeZSXkbqreWW1g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Kuroda-san,
Thanks for reviewing the patch.
> > Solution:
> > 1. When we alter a publication using commands like ‘ALTER PUBLICATION
> > pub_name DROP TABLE table_name’, first all tables in the publications
> > are invalidated using the function ‘rel_sync_cache_relation_cb’. Then
> > again ‘rel_sync_cache_publication_cb’ function is called which
> > invalidates all the tables.
>
> On my environment, rel_sync_cache_publication_cb() was called first and invalidate
> all the entries, then rel_sync_cache_relation_cb() was called and the specified
> entry is invalidated - hence second is NO-OP.
>
You are correct. I made a silly mistake while writing the write-up.
rel_sync_cache_publication_cb() is called first and invalidate all the
entries, then rel_sync_cache_relation_cb() is called and the specified
entry is invalidated
> > This happens because of the following
> > callback registered:
> > CacheRegisterSyscacheCallback(PUBLICATIONRELMAP,
> > rel_sync_cache_publication_cb, (Datum) 0);
>
> But even in this case, I could understand that you want to remove the
> rel_sync_cache_publication_cb() callback.
Yes, I think rel_sync_cache_publication_cb() callback can be removed,
as it is invalidating all the other tables as well (which are not in
this publication).
> > 2. When we add/drop a schema to/from a publication using command like
> > ‘ALTER PUBLICATION pub_name ADD TABLES in SCHEMA schema_name’, first
> > all tables in that schema are invalidated using
> > ‘rel_sync_cache_relation_cb’ and then again
> > ‘rel_sync_cache_publication_cb’ function is called which invalidates
> > all the tables.
>
> Even in this case, rel_sync_cache_publication_cb() was called first and then
> rel_sync_cache_relation_cb().
>
Yes, your observation is correct. rel_sync_cache_publication_cb() is
called first and then rel_sync_cache_relation_cb().
> >
> > 3. When we alter a namespace using command like ‘ALTER SCHEMA
> > schema_name RENAME to new_schema_name’ all the table in cache are
> > invalidated as ‘rel_sync_cache_publication_cb’ is called due to the
> > following registered callback:
> > CacheRegisterSyscacheCallback(NAMESPACEOID,
> > rel_sync_cache_publication_cb, (Datum) 0);
> >
> > So, we added a new callback function ‘rel_sync_cache_namespacerel_cb’
> > will be called instead of function ‘rel_sync_cache_publication_cb’ ,
> > which invalidates only the cache of the tables which are part of that
> > particular namespace. For the new function the ‘namespace id’ is added
> > in the Invalidation message.
>
> Hmm, I feel this fix is too much. Unlike ALTER PUBLICATION statements, I think
> ALTER SCHEMA is rarely executed at the production stage. However, this approach
> requires adding a new cache callback system, which affects the entire postgres
> system; this is not very beneficial compared to the outcome. It should be discussed
> on another thread to involve more people, and then we can add the improvement
> after being accepted.
>
Yes, I also agree with you. I have removed the changes in the updated patch.
> > Performance Comparison:
> > I have run the same tests as shared in [1] and observed a significant
> > decrease in the degradation with the new changes. With selective
> > invalidation degradation is around ~5%. This results are an average of
> > 3 runs.
>
> IIUC, the executed workload did not contain ALTER SCHEMA command, so
> third improvement did not contribute this improvement.
I have removed the changes corresponding to the third improvement.
I have addressed the comment for 0002 patch and attached the patches.
Also, I have moved the tests in the 0002 to 0001 patch.
Thanks and Regards,
Shlok Kyal
Attachment | Content-Type | Size |
---|---|---|
v10-0002-Selective-Invalidation-of-Cache.patch | application/octet-stream | 1.7 KB |
v10-0001-Distribute-invalidatons-if-change-in-catalog-tab.patch | application/octet-stream | 12.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2024-09-27 12:00:17 | Re: Extension security improvement: Add support for extensions with an owned schema |
Previous Message | Andrew Dunstan | 2024-09-27 11:07:51 | Re: SQL:2023 JSON simplified accessor support |