Re: Logical Replication of sequences

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Hou, Zhijie/侯 志杰 <houzj(dot)fnst(at)fujitsu(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Subject: Re: Logical Replication of sequences
Date: 2024-07-29 05:49:39
Message-ID: CALDaNm3SucGGLe-B-a_aqWNWQZ-yfxFTiAA0JyP-SwX4jq9Y3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 25 Jul 2024 at 12:54, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi, here are more review comments for patch v20240720-0003.
> 7.
> The logic seems over-complicated. For example, why is the sequence
> list *always* fetched, but the tables list is only sometimes fetched?
> Furthermore, this 'refresh_all_sequences' parameter seems to have a
> strange interference with tables (e.g. even though it is possible to
> refresh all tables and sequences at the same time). It is as if the
> meaning is 'refresh_publication_sequences' yet it is not called that
> (???)
>
> These gripes may be related to my other thread [1] about the new ALTER
> syntax. (I feel that there should be the ability to refresh ALL TABLES
> or ALL SEQUENCES independently if the user wants to). IIUC, it would
> simplify this function logic as well as being more flexible. Anyway, I
> will leave the discussion about syntax to that other thread.

1) ALTER SUBCRIPTION ... REFRESH PUBLICATION
This command will refresh both tables and sequences. It will remove
stale tables and sequences and include newly added tables and
sequences.
2) ALTER SUBCRIPTION ... REFRESH PUBLICATION SEQUENCES
This command will refresh only sequences. It will remove stale
sequences and synchronize all sequences including the existing
sequences.
So the table will be fetched only for the first command.
I have changed refresh_publication_sequences parameter to tables,
sequences, all_relations with this the function should be easier to
understand and remove any confusions.

> ~
>
> 9.
> ereport(DEBUG1,
> - (errmsg_internal("table \"%s.%s\" removed from subscription \"%s\"",
> + (errmsg_internal("%s \"%s.%s\" removed from subscription \"%s\"",
> + get_namespace_name(get_rel_namespace(relid)),
> + get_rel_name(relid),
> + sub->name,
> + get_rel_relkind(relid) == RELKIND_SEQUENCE ? "sequence" : "table")));
>
> IIUC prior conDitions mean get_rel_relkind(relid) == RELKIND_SEQUENCE
> will be impossible here.

Consider a scenario where logical replication is setup with sequences
seq1, seq2.
Now drop sequence seq1 and do "ALTER SUBSCRIPTION sub REFRESH PUBLICATION"
It will hit this code to generate the log:
DEBUG: sequence "public.seq1" removed from subscription "test1"

> ======
> .../replication/logical/sequencesync.c
>
> 13. fetch_remote_sequence_data
>
> The "current state" mentioned in the function comment is a bit vague.
> Can't tell from this comment what it is returning without looking
> deeper into the function code.

Added more comments to clarify.

> ~~~
>
> 15. process_syncing_sequences_for_apply
>
> + /* We need up-to-date sync state info for subscription sequences here. */
> + FetchTableStates(&started_tx, SUB_REL_KIND_ALL);
>
> Should that say SUB_REL_KIND_SEQUENCE?

We cannot pass SUB_REL_KIND_SEQUENCE here because the
pg_subscription_rel table is shared between sequences and tables. As
changes to either sequences or relations can affect the validity of
relation states, we update both table_states_not_ready and
sequence_states_not_ready simultaneously to ensure consistency, rather
than updating them separately. I have removed the relation kind
parameter now. Fetch tables is called to fetch all tables and
sequences before calling process_syncing_tables_for_apply and
process_syncing_sequences_for_apply now.

> ~
>
> 16.
> + /*
> + * If there are free sync worker slot(s), start a new sequence
> + * sync worker, and break from the loop.
> + */
> + if (nsyncworkers < max_sync_workers_per_subscription)
>
> Should this "if" have some "else" code to log a warning if we have run
> out of free workers? Otherwise, how will the user know that the system
> may need tuning?

I felt no need to log here else we will get a lot of log messages
which might not be required. Similar logic is used for tablesync to in
process_syncing_tables_for_apply.

> ~~~
>
> 17. FetchTableStates
>
> /* Fetch all non-ready tables. */
> - rstates = GetSubscriptionRelations(MySubscription->oid, true);
> + rstates = GetSubscriptionRelations(MySubscription->oid, rel_type, true);
>
> This feels risky. IMO there needs to be some prior Assert about the
> rel_type. For example, if it happened to be SUB_REL_KIND_SEQUENCE then
> this function code doesn't seem to make sense.

The pg_subscription_rel table is shared between sequences and tables.
As changes to either sequences or relations can affect the validity of
relation states, we update both table_states_not_ready and
sequence_states_not_ready simultaneously to ensure consistency, rather
than updating them separately. This will update both tables and
sequences that should be synced.

The rest of the comments are fixed. The attached v20240729 version
patch has the changes for the same.

Regards,
Vignesh

Attachment Content-Type Size
v20240729-0001-Introduce-pg_sequence_state-and-SetSequenc.patch text/x-patch 13.9 KB
v20240729-0003-Enhance-sequence-synchronization-during-su.patch text/x-patch 82.9 KB
v20240729-0002-Introduce-ALL-SEQUENCES-support-for-Postgr.patch text/x-patch 90.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2024-07-29 05:52:06 Re: Logical Replication of sequences
Previous Message David Rowley 2024-07-29 05:07:06 Re: Add mention of execution time memory for enable_partitionwise_* GUCs