From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Shlok Kyal' <shlok(dot)kyal(dot)oss(at)gmail(dot)com> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Nitin Motiani <nitinmotiani(at)google(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Subject: | RE: long-standing data loss bug in initial sync of logical replication |
Date: | 2024-09-26 11:53:07 |
Message-ID: | TYAPR01MB569258257A869477A27F0BD0F56A2@TYAPR01MB5692.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear Shlok,
> Hi,
>
> I tried to add changes to selectively invalidate the cache to reduce
> the performance degradation during the distribution of invalidations.
Thanks for improving the patch!
>...
>
> Solution:
> 1. When we alter a publication using commands like ‘ALTER PUBLICATION
> pub_name DROP TABLE table_name’, first all tables in the publications
> are invalidated using the function ‘rel_sync_cache_relation_cb’. Then
> again ‘rel_sync_cache_publication_cb’ function is called which
> invalidates all the tables.
On my environment, rel_sync_cache_publication_cb() was called first and invalidate
all the entries, then rel_sync_cache_relation_cb() was called and the specified
entry is invalidated - hence second is NO-OP.
> This happens because of the following
> callback registered:
> CacheRegisterSyscacheCallback(PUBLICATIONRELMAP,
> rel_sync_cache_publication_cb, (Datum) 0);
But even in this case, I could understand that you want to remove the
rel_sync_cache_publication_cb() callback.
> 2. When we add/drop a schema to/from a publication using command like
> ‘ALTER PUBLICATION pub_name ADD TABLES in SCHEMA schema_name’, first
> all tables in that schema are invalidated using
> ‘rel_sync_cache_relation_cb’ and then again
> ‘rel_sync_cache_publication_cb’ function is called which invalidates
> all the tables.
Even in this case, rel_sync_cache_publication_cb() was called first and then
rel_sync_cache_relation_cb().
>
> 3. When we alter a namespace using command like ‘ALTER SCHEMA
> schema_name RENAME to new_schema_name’ all the table in cache are
> invalidated as ‘rel_sync_cache_publication_cb’ is called due to the
> following registered callback:
> CacheRegisterSyscacheCallback(NAMESPACEOID,
> rel_sync_cache_publication_cb, (Datum) 0);
>
> So, we added a new callback function ‘rel_sync_cache_namespacerel_cb’
> will be called instead of function ‘rel_sync_cache_publication_cb’ ,
> which invalidates only the cache of the tables which are part of that
> particular namespace. For the new function the ‘namespace id’ is added
> in the Invalidation message.
Hmm, I feel this fix is too much. Unlike ALTER PUBLICATION statements, I think
ALTER SCHEMA is rarely executed at the production stage. However, this approach
requires adding a new cache callback system, which affects the entire postgres
system; this is not very beneficial compared to the outcome. It should be discussed
on another thread to involve more people, and then we can add the improvement
after being accepted.
> Performance Comparison:
> I have run the same tests as shared in [1] and observed a significant
> decrease in the degradation with the new changes. With selective
> invalidation degradation is around ~5%. This results are an average of
> 3 runs.
IIUC, the executed workload did not contain ALTER SCHEMA command, so
third improvement did not contribute this improvement.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
From | Date | Subject | |
---|---|---|---|
Next Message | jian he | 2024-09-26 12:19:46 | Re: not null constraints, again |
Previous Message | Amit Kapila | 2024-09-26 10:58:54 | Re: Documentation to upgrade logical replication cluster |