Re: Logical Replication of sequences

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Euler Taveira <euler(at)eulerto(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Subject: Re: Logical Replication of sequences
Date: 2025-04-16 01:38:39
Message-ID: CAHut+Ptc0gG=4j_BVqxHRGa=TKY_PsYu0RdsT6YuWPiNkSRhOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Vignesh,

Some review comments for patch v20250325-0005 (docs).

======
doc/src/sgml/catalogs.sgml

(52.55. pg_subscription_rel)

1.
State code: i = initialize, d = data is being copied, f = finished
table copy, s = synchronized, r = ready (normal replication)

~

This is not part of the patch, but AFAIK not all of those states are
relevant if the srrelid was a SEQUENCE relation. Should there be 2
sets of states given here for what is possible for tables/sequences?

======
doc/src/sgml/config.sgml

(19.6.4. Subscribers)

2.
<para>
In logical replication, this parameter also limits how often a failing
- replication apply worker or table synchronization worker will be
- respawned.
+ replication apply worker or table synchronization worker or sequence
+ synchronization worker will be respawned.
</para>

Does it though? I thought /often/quickly/ because if there is some
ERROR the respawning may occur again and again forever, so you are not
really limiting how often it occurs, only the *rate* at which the
respawning happens.

======
doc/src/sgml/logical-replication.sgml

(29.1 Publication)

3.
Publications may currently only contain tables and all tables in schema.
Objects must be added explicitly, except when a publication is created for
- <literal>ALL TABLES</literal>.
+ <literal>ALL TABLES</literal>. Publications can include sequences as well,
+ but their behavior differs from that of tables or groups of tables. Unlike
+ tables, sequences allow users to synchronize their current state at any
+ given time. For more information, refer to
+ <xref linkend="logical-replication-sequences"/>.
</para>

This doesn't really make sense. The change seems too obviously just
tacked onto the end of the existing documentation. e.g It is strange
saying "Publications may currently only contain tables and all tables
in schema." and then later saying Oh, BTW "Publications can include
sequences as well". I think the whole paragraph may need some
reworking.

The preceding para on this page also still says "A publication is a
set of changes generated from a table or a group of tables" which also
failed to mention sequences.

OTOH, I think it would get too confusing to mush everything together,
so after saying publish tables and sequences, then I think there
should be one paragraph that just talks about publishing tables,
followed by another paragraph that just talks about publishing
sequences. Probably this should mention ALL SEQUENCES too since it
already mentioned ALL TABLES.

~~~

(29.6. Replicating Sequences)

4.
+ <para>
+ To replicate sequences from a publisher to a subscriber, first publish the
+ sequence using <link
linkend="sql-createpublication-params-for-all-sequences">
+ <command>CREATE PUBLICATION ... FOR ALL SEQUENCES</command></link>.
+ </para>

/first publish the sequence/first publish them/

~~~

5.
+ <para>
+ A new <firstterm>sequence synchronization worker</firstterm> will be started
+ to synchronize the sequences after executing any of the above subscriber
+ commands, and will exit once the sequences are synchronized.
+ </para>

IMO the name of the worker makes it obvious what it does so I think
you can remove the redundant words "to synchronize the sequences" from
this sentence.

~~~

(29.7.1. Sequence Definition Mismatches)

6.
+ <para>
+ To resolve this, use
+ <link linkend="sql-altersequence"><command>ALTER SEQUENCE</command></link>
+ to align the subscriber's sequence parameters with those of the publisher.
+ Subsequently, execute <link
linkend="sql-altersubscription-params-refresh-publication-sequences">
+ <command>ALTER SUBSCRIPTION ... REFRESH PUBLICATION
SEQUENCES</command></link>.
+ </para>

/Subsequently,/Then,/

~~~

(29.6.3. Examples)

7.
+ <para>
+ Create some test sequences on the publisher.

This whole section is just examples, so I don't think you need to call
them "test" sequences. Just say "Create some sequences on the
publisher.".

~~~

(29.9. Restrictions)

8.
<para>
- Sequence data is not replicated. The data in serial or identity columns
- backed by sequences will of course be replicated as part of the table,
- but the sequence itself would still show the start value on the
- subscriber. If the subscriber is used as a read-only database, then this
- should typically not be a problem. If, however, some kind of switchover
- or failover to the subscriber database is intended, then the sequences
- would need to be updated to the latest values, either by copying the
- current data from the publisher (perhaps
- using <command>pg_dump</command>) or by determining a sufficiently high
- value from the tables themselves.
+ Incremental sequence changes are not replicated. The data in serial or
+ identity columns backed by sequences will of course be replicated as part
+ of the table, but the sequence itself would still show the start value on
+ the subscriber. If the subscriber is used as a read-only database, then
+ this should typically not be a problem. If, however, some kind of
+ switchover or failover to the subscriber database is intended, then the
+ sequences would need to be updated to the latest values, either
by executing
+ <link linkend="sql-altersubscription-params-refresh-publication-sequences">
+ <command>ALTER SUBSCRIPTION ... REFRESH PUBLICATION
SEQUENCES</command></link>
+ or by copying the current data from the publisher (perhaps using
+ <command>pg_dump</command>) or by determining a sufficiently high value
+ from the tables themselves.
</para>

This doesn't seem strictly correct to say "but the sequence itself
would still show the start value on the subscriber". AFAIK,
synchronization also happens on the CREATE SUBSCRIPTION command when
copy_data=true, so if any sequences had been published (FOR ALL
SEQUENCES) then the subscriber sequence would get the up-to-date
current values (not the "start value"), right?

~~~

(29.13.2. Subscribers)

9.
<para>
<link linkend="guc-max-sync-workers-per-subscription"><varname>max_sync_workers_per_subscription</varname></link>
controls the amount of parallelism of the initial data copy during the
- subscription initialization or when new tables are added.
+ subscription initialization or when new tables or sequences are added.
</para>

This seems kind of misleading because there is no "amount of
parallelism" for sequences since there is never more than one sequence
sync worker. Maybe there is a more accurate way to word this.

SUGGESTION (maybe like this?)
max_sync_workers_per_subscription controls how many tables can be
synchronized in parallel during subscription initialization or when
new tables are added. One additional worker is also needed for
sequence synchronization.

======
doc/src/sgml/ref/alter_subscription.sgml

(ALTER SUBSCRIPTION REFRESH PUBLICATION / copy_data)

10.
+ <para>
+ See <xref linkend="sequence-definition-mismatches"/> for
recommendations on how
+ to handle any warnings about differences in the sequence definition
+ between the publisher and the subscriber, which might occur when
+ <literal>copy_data = true</literal>.
+ </para>

/any warnings about differences in the sequence definition between/any
warnings about sequence definition differences between/

~~~

(ALTER SUBSCRIPTION REFRESH PUBLICATION SEQUENCES)

11.
+ <para>
+ See <xref linkend="sequence-definition-mismatches"/> for
+ recommendations on how to handle any warnings about differences in the
+ sequence definition between the publisher and the subscriber.
+ </para>

Ditto my previous review comment #10.

======
doc/src/sgml/ref/create_subscription.sgml

(CREATE SUBSCRIPTION / copy_data)

12.
+ <para>
+ See <xref linkend="sequence-definition-mismatches"/>
+ for recommendations on how to handle any warnings about
differences in
+ the sequence definition between the publisher and the subscriber,
+ which might occur when <literal>copy_data = true</literal>.
+ </para>

Ditto my previous review comment #10.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2025-04-16 02:10:59 transforms [was Re: FmgrInfo allocation patterns (and PL handling as staged programming)]
Previous Message Noah Misch 2025-04-16 01:02:13 Re: [18] Unintentional behavior change in commit e9931bfb75