Re: Logical Replication of sequences

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Hou, Zhijie/侯 志杰 <houzj(dot)fnst(at)fujitsu(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Subject: Re: Logical Replication of sequences
Date: 2024-07-31 07:25:39
Message-ID: CAHut+PvEgiPyV4sKbDaokVYV7K0wJXkP33W6-9P6Q1=edwQB1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Vignesh,

Here are my review comments for your latest 0730_2* patches.

Patch v20240730_2-0001 looks good to me.

Patch v20240730_2-0002 looks good to me.

My comments for the v20240730_2-0003 patch are below:

//////////

GENERAL

1. Inconsistent terms

I've noticed there are many variations of how the sequence sync worker is known:
- "sequencesync worker"
- "sequence sync worker"
- "sequence-sync worker"
- "sequence synchronization worker"
- more?

We must settle on some standardized name.

AFAICT we generally use "table synchronization worker" in the docs,
and "tablesync worker" in the code and comments. IMO, we should do
same as that for sequences -- e.g. "sequence synchronization worker"
in the docs, and "sequencesync worker" in the code and comments.

======
doc/src/sgml/catalogs.sgml

nitpick - the links should jump directly to REFRESH PUBLICATION or
REFRESH PUBLICATION SEQUENCES. Currently they go to the top of the
ALTER SUBSCRIPTION page which is not as useful.

======
src/backend/commands/sequence.c

do_setval:
nitpick - minor wording in the function header
nitpick - change some param names to more closely resemble the fields
they get assigned to (/logcnt/log_cnt/, /iscalled/is_called/)

~

2.
seq->is_called = iscalled;
- seq->log_cnt = 0;
+ seq->log_cnt = (logcnt == SEQ_LOG_CNT_INVALID) ? 0: logcnt;

The logic here for SEQ_LOG_CNT_INVALID seemed strange. Why not just
#define SEQ_LOG_CNT_INVALID as 0 in the first place if that is what
you will assign for invalid? Then you won't need to do anything here
except seq->log_cnt = log_cnt;

======
src/backend/catalog/pg_subscription.c

HasSubscriptionRelations:
nitpick - I think the comment "If even a single tuple exists..." is
not quite accurate. e.g. It also has to be the right kind of tuple.

~~

GetSubscriptionRelations:
nitpick - Give more description in the function header about the other
parameters.
nitpick - I felt that a better name for 'all_relations' is all_states.
Because in my mind *all relations* sounds more like when both
'all_tables' and 'all_sequences' are true.
nitpick - IMO add an Assert to be sure something is being fetched.
Assert(get_tables || get_sequences);
nitpick - Rephrase the "skip the tables" and "skip the sequences"
comments to be more aligned with the code condition.

~

3.
- if (not_ready)
+ /* Get the relations that are not in ready state */
+ if (get_tables && !all_relations)
ScanKeyInit(&skey[nkeys++],
Anum_pg_subscription_rel_srsubstate,
BTEqualStrategyNumber, F_CHARNE,
CharGetDatum(SUBREL_STATE_READY));
+ /* Get the sequences that are in init state */
+ else if (get_sequences && !all_relations)
+ ScanKeyInit(&skey[nkeys++],
+ Anum_pg_subscription_rel_srsubstate,
+ BTEqualStrategyNumber, F_CHAREQ,
+ CharGetDatum(SUBREL_STATE_INIT));

This is quite tricky, using multiple flags (get_tables and
get_sequences) in such a way. It might even be a bug -- e.g. Is the
'else' keyword correct? Otherwise, when both get_tables and
get_sequences are true, and all_relations is false, then the sequence
part wouldn't even get executed (???).

======
src/backend/commands/subscriptioncmds.c

CreateSubscription:
nitpick - let's move the 'tables' declaration to be beside the
'sequences' var for consistency. (in passing move other vars too)
nitpick - it's not strictly required for the patch, but let's change
the 'tables' loop to be consistent with the new sequences loop.

~~~

4. AlterSubscription_refresh

My first impression (from the function comment) is that these function
parameters are a bit awkward. For example,
- It says: If 'copy_data' parameter is true, the function will set
the state to "init"; otherwise, it will set the state to "ready".
- It also says: "If 'all_relations' is true, mark all objects with
"init" state..."
Those statements seem to clash. e.g. if copy_data is false but
all_relations is true, then what (???)

~

nitpick - tweak function comment wording.
nitpick - introduce a 'relkind' variable to avoid multiple calls of
get_rel_relkind(relid)
nitpick - use an existing 'relkind' variable instead of calling
get_rel_relkind(relid);
nitpick - add another comment about skipping (for dropping tablesync slots)

~

5.
+ /*
+ * If all the relations should be re-synchronized, then set the
+ * state to init for re-synchronization. This is currently
+ * supported only for sequences.
+ */
+ else if (all_relations)
+ {
+ ereport(DEBUG1,
+ (errmsg_internal("sequence \"%s.%s\" of subscription \"%s\" set to
INIT state",
get_namespace_name(get_rel_namespace(relid)),
get_rel_name(relid),
sub->name)));
+ UpdateSubscriptionRelState(sub->oid, relid, SUBREL_STATE_INIT,
+ InvalidXLogRecPtr);

(This is a continuation of my doubts regarding 'all_relations' in the
previous review comment #4 above)

Here are some more questions about it:

~

5a. Why is this an 'else' of the !bsearch? It needs more explanation
what this case means.

~

5b. Along with more description, it might be better to reverse the
!bsearch condition, so this ('else') code is not so distantly
separated from the condition.

~

5c. Saying "only supported for sequences" seems strange: e.g. what
would it even mean to "re-synchronize" tables? They would all have to
be truncated first -- so if re-sync for tables has no meaning maybe
the parameter is misnamed and should just be 'resync_all_sequences' or
similar? In any case, an Assert here might be good.

======
src/backend/replication/logical/launcher.c

logicalrep_worker_find:

nitpick - I feel the function comment "We are only interested in..."
is now redundant since you are passing the exact worker type you want.
nitpick - I added an Assert for the types you are expecting to look for
nitpick - The comment "Search for attached worker..." is stale now
because there are more search criteria
nitpick - IMO the "Skip parallel apply workers." code is no longer
needed now that you are matching the worker type.

~~~

6. logicalrep_worker_launch

* - must be valid worker type
* - tablesync workers are only ones to have relid
* - parallel apply worker is the only kind of subworker
+ * - sequencesync workers will not have relid
*/
Assert(wtype != WORKERTYPE_UNKNOWN);
Assert(is_tablesync_worker == OidIsValid(relid));
Assert(is_parallel_apply_worker == (subworker_dsm != DSM_HANDLE_INVALID));
+ Assert(!is_sequencesync_worker || !OidIsValid(relid));

On further reflection, is that added comment and added Assert even
needed? I think they can be removed because saying "tablesync workers
are only ones to have relid" seems to already cover what we needed to
say/assert.

~~~

logicalrep_worker_stop:
nitpick - /type/wtype/ for readability

~~~

7.
/*
* Count the number of registered (not necessarily running) sync workers
* for a subscription.
*/
int
logicalrep_sync_worker_count(Oid subid)

~

I thought this function should count the sequencesync worker as well.

======
.../replication/logical/sequencesync.c

fetch_remote_sequence_data:
nitpick - tweaked function comment
nitpick - /value/last_value/ for readability

~

8.
+ *lsn = DatumGetInt64(slot_getattr(slot, 4, &isnull));
+ Assert(!isnull);

Should that be DatumGetUInt64?

~~~

copy_sequence:
nitpick - tweak function header.
nitpick - renamed the sequence vars for consistency, and declared them
all together.

======
src/backend/replication/logical/tablesync.c

9.
void
invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
{
- table_states_validity = SYNC_TABLE_STATE_NEEDS_REBUILD;
+ relation_states_validity = SYNC_TABLE_STATE_NEEDS_REBUILD;
}

I assume you changed the 'table_states_validity' name because this is
no longer exclusively for tables. So, should the function name also be
similarly changed?

~~~

process_syncing_sequences_for_apply:
nitpick - tweaked the function comment
nitpick - cannot just say "if there is not one already." a sequence
syn worker might not even be needed.
nitpick - added blank line for readability

~

10.
+ if (syncworker)
+ {
+ /* Now safe to release the LWLock */
+ LWLockRelease(LogicalRepWorkerLock);
+ break;
+ }
+ else
+ {

This 'else' can be removed if you wish to pull back all the indentation.

~~~

11.
process_syncing_tables(XLogRecPtr current_lsn)

Is the function name still OK given that is is now also syncing for sequences?

~~~

FetchTableStates:
nitpick - Reworded some of the function comment
nitpick - Function comment is stale because it is still referring to
the function parameter which this patch removed.
nitpick - tweak a comment

======
src/include/commands/sequence.h

12.
+#define SEQ_LOG_CNT_INVALID (-1)

See a previous review comment (#2 above) where I wondered why not use
value 0 for this.

~~~

13.
extern void SequenceChangePersistence(Oid relid, char newrelpersistence);
extern void DeleteSequenceTuple(Oid relid);
extern void ResetSequence(Oid seq_relid);
+extern void do_setval(Oid relid, int64 next, bool iscalled, int64 logcnt);
extern void ResetSequenceCaches(void);

do_setval() was an OK function name when it was static, but as an
exposed API it seems like a terrible name. IMO rename it to something
like 'SetSequence' to match the other API functions nearby.

~

nitpick - same change to the parameter names as suggested for the
implementation.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment Content-Type Size
PS_NITPICKS_20240731_SEQ_003.txt text/plain 18.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2024-07-31 07:27:15 Re: Lack of possibility to specify CTAS TAM
Previous Message David G. Johnston 2024-07-31 07:22:00 Re: Lack of possibility to specify CTAS TAM