Re: Logical Replication of sequences

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Hou, Zhijie/侯 志杰 <houzj(dot)fnst(at)fujitsu(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Subject: Re: Logical Replication of sequences
Date: 2024-08-06 09:08:00
Message-ID: CAHut+PvD4jUsgSkabE9UtKcpvXbLFUxktaWnz8KAKAry1tj2-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for the patch v20240805_2-0003.

======
doc/src/sgml/catalogs.sgml

nitpick - removed the word "either"

======
doc/src/sgml/ref/alter_subscription.sgml

I felt the discussions about "how to handle warnings" are a bit scattered:
e.g.1 - ALTER SUBSCRIPTION REFRESH PUBLICATION copy data referred to
CREATE SUBSCRIPTION copy data.
e.g.2 - ALTER SUBSCRIPTION REFRESH explains what to do, but now the
explanation is in 2 places.
e.g.3 - CREATE SUBSCRIPTION copy data explains what to do (again), but
IMO it belongs better in the common "Notes" part

FYI, I've moved all the information to one place (in the CREATE
SUBSCRIPTION "Notes") and others refer to this central place. See the
attached nitpicks diff.

REFRESH PUBLICATION copy_data
nitpick - now refers to CREATE SUBSCRIPTION "Notes". I also moved it
to be nearer to the other sequence stuff.

REFRESH PUBLICATION SEQUENCES:
nitpick - now refers to CREATE SUBSCRIPTION "Notes".

======
doc/src/sgml/ref/create_subscription.sgml

REFRESH PUBLICATION copy_data
nitpick - now refers to CREATE SUBSCRIPTION "Notes"

Notes:
nitpick - the explanation of, and what to do about sequence WARNINGS,
is moved to here

======
src/backend/commands/sequence.c

pg_sequence_state:
nitpick - I just moved the comment in pg_sequence_state() to below the
NOTE, which talks about "page LSN".

======
src/backend/catalog/pg_subscription.c

1. HasSubscriptionRelations

Should function 'HasSubscriptionRelations' be renamed to
'HasSubscriptionTables'?

~~~

GetSubscriptionRelations:
nitpick - tweak some "skip" comments.

======
src/backend/commands/subscriptioncmds.c

2. CreateSubscription

tables = fetch_table_list(wrconn, publications);
- foreach(lc, tables)
+ foreach_ptr(RangeVar, rv, tables)
+ {
+ Oid relid;
+
+ relid = RangeVarGetRelid(rv, AccessShareLock, false);
+
+ /* Check for supported relkind. */
+ CheckSubscriptionRelkind(get_rel_relkind(relid),
+ rv->schemaname, rv->relname);
+
+ AddSubscriptionRelState(subid, relid, table_state,
+ InvalidXLogRecPtr, true);
+ }
+
+ /* Add the sequences in init state */
+ sequences = fetch_sequence_list(wrconn, publications);
+ foreach_ptr(RangeVar, rv, sequences)

These 2 loops (first for tables and then for sequences) seem to be
executing the same code. If you wanted you could combine the lists
up-front, and then have one code loop instead of 2. It would mean less
code. OTOH, maybe the current code is more readable? I am not sure
what is best, so just bringing this to your attention.

~~~

AlterSubscription_refresh:
nitpick = typo /indicating tha/indicating that/

~~~

3. fetch_sequence_list

+ appendStringInfoString(&cmd, "SELECT DISTINCT n.nspname, c.relname,
s.seqtypid, s.seqmin, s.seqmax, s.seqstart, s.seqincrement,
s.seqcycle"
+ " FROM pg_publication p, LATERAL
pg_get_publication_sequences(p.pubname::text) gps(relid),"
+ " pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace JOIN
pg_sequence s ON c.oid = s.seqrelid"
+ " WHERE c.oid = gps.relid AND p.pubname IN (");
+ get_publications_str(publications, &cmd, true);
+ appendStringInfoChar(&cmd, ')');

Please wrap this better to make the SQL more readable.

~~

4.
+ if (seqform->seqtypid != seqtypid || seqform->seqmin != seqmin ||
+ seqform->seqmax != seqmax || seqform->seqstart != seqstart ||
+ seqform->seqincrement != seqincrement ||
+ seqform->seqcycle != seqcycle)
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Sequence option in remote and local is not same for \"%s.%s\"",
+ get_namespace_name(get_rel_namespace(relid)), get_rel_name(relid)),
+ errhint("Alter/Re-create the sequence using the same options as in remote."));

4a.
Are these really known as "options"? Or should they be called
"sequence parameters", or something else, like "sequence attributes"?

4a.
Is there a way to give more helpful information by identifying what
was different in the log? OTOH, maybe it would become too messy if
there were multiple differences...

======
src/backend/replication/logical/launcher.c

5. logicalrep_sync_worker_count

- if (isTablesyncWorker(w) && w->subid == subid)
+ if ((isTableSyncWorker(w) || isSequenceSyncWorker(w)) &&
+ w->subid == subid)

You could micro-optimize this -- it may be more efficient to write the
condition the other way around.

SUGGESTION
if (w->subid == subid && (isTableSyncWorker(w) || isSequenceSyncWorker(w)))

======
.../replication/logical/sequencesync.c

File header comment:
nitpick - there seems a large cut/paste mistake (the first 2
paragraphs are almost the same).
nitpick - reworded with the help of Chat-GPT for slightly better
clarity. Also fixed a couple of typos.
nitpick - it mentioned MAX_SEQUENCES_SYNC_PER_BATCH several times so I
changed the wording of one of them

~~~

fetch_remote_sequence_data:
nitpick - all other params have the same name as sequence members, so
change the parameter name /lsn/page_lsn/

~

copy_sequence:
nitpick - rename var /seq_lsn/seq_page_lsn/

======
src/backend/replication/logical/tablesync.c

6. process_syncing_sequences_for_apply

+ * If a sequencesync worker is running already, there is no need to start a new
+ * one; the existing sequencesync worker will synchronize all the sequences. If
+ * there are still any sequences to be synced after the sequencesync worker
+ * exited, then a new sequencesync worker can be started in the next iteration.
+ * To prevent starting the sequencesync worker at a high frequency after a
+ * failure, we store its last failure time. We start the sync worker for the
+ * same relation after waiting at least wal_retrieve_retry_interval.

Why is it talking about "We start the sync worker for the same
relation ...". The sequencesync_failuretime is per sync worker, not
per relation. And, I don't see any 'same relation' check in the code.

======
src/include/catalog/pg_subscription_rel.h

GetSubscriptionRelations:
nitpick - changed parameter name /all_relations/all_states/

======
src/test/subscription/t/034_sequences.pl

nitpick - add some ########## comments to highlight the main test
parts to make it easier to read.
nitpick - fix typo /syned/synced/

7. More test cases?
IIUC you can also get a sequence mismatch warning during "ALTER ...
REFRESH PUBLICATION", and "CREATE SUBSCRIPTION". So, should those be
tested also?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment Content-Type Size
PS_NITPICKS_20240806_SEQ_0003.txt text/plain 14.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amul Sul 2024-08-06 09:34:12 Re: Support specify tablespace for each merged/split partition
Previous Message Bertrand Drouvot 2024-08-06 09:00:03 Re: Restart pg_usleep when interrupted