From: | Peter Smith <smithpb2250(at)gmail(dot)com> |
---|---|
To: | "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com> |
Cc: | Euler Taveira <euler(at)eulerto(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Time delayed LR (WAS Re: logical replication restrictions) |
Date: | 2022-12-06 08:00:07 |
Message-ID: | CAHut+PsCF_a4Mpcdo5bcE-4YuYqzR2Kzj_v=3CFROTP2vnPECA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Here are some review comments for patch v9-0001:
======
GENERAL
1. min_ prefix?
What's the significance of the "min_" prefix for this parameter? I'm
guessing the background is that at one time it was considered to be a
GUC so took a name similar to GUC recovery_min_apply_delay (??)
But in practice, I think it is meaningless and/or misleading. For
example, suppose the user wants to defer replication by 1hr. IMO it is
more natural to just say "defer replication by 1 hr" (aka
apply_delay='1hr') Clearly it means replication will take place about
1 hr into the future. OTHO saying "defer replication by a MINIMUM of 1
hr" (aka min_apply_delay='1hr') is quite vague because then it is
equally valid if the replication gets delayed by 1 hr or 2 hrs or 5
days or 3 weeks since all of those satisfy the minimum delay. The
implementation could hardwire a delay of INT_MAX ms but clearly,
that's not really what the user would expect.
~
So, I think this parameter should be renamed just as 'apply_delay'.
But, if you still decide to keep it as 'min_apply_delay' then there is
a lot of other code that ought to be changed to be consistent with
that name.
e.g.
- subapplydelay in catalogs.sgml --> subminapplydelay
- subapplydelay in system_views.sql --> subminapplydelay
- subapplydelay in pg_subscription.h --> subminapplydelay
- subapplydelay in dump.h --> subminapplydelay
- i_subapplydelay in pg_dump.c --> i_subminapplydelay
- applydelay member name of Form_pg_subscription --> minapplydelay
- "Apply Delay" for the column name displayed by describe.c --> "Min
apply delay"
- more...
(IMO the fact that so much code does not currently say 'min' at all is
just evidence that the 'min' prefix really didn't really mean much in
the first place)
======
doc/src/sgml/catalogs.sgml
2. Section 31.2 Subscription
+ <para>
+ Time delayed replica of subscription is available by indicating
+ <literal>min_apply_delay</literal>. See
+ <xref linkend="sql-createsubscription"/> for details.
+ </para>
How about saying like:
SUGGESTION
The subscriber replication can be instructed to lag behind the
publisher side changes by specifying the
<literal>min_apply_delay</literal> subscription parameter. See XXX for
details.
======
doc/src/sgml/ref/create_subscription.sgml
3. min_apply_delay
+ <para>
+ By default, subscriber applies changes as soon as possible. As with
+ the physical replication feature
+ (<xref linkend="guc-recovery-min-apply-delay"/>), it can be useful to
+ have a time-delayed logical replica. This parameter allows you to
+ delay the application of changes by a specified amount of time. If
+ this value is specified without units, it is taken as milliseconds.
+ The default is zero, adding no delay.
+ </para>
"subscriber applies" -> "the subscriber applies"
"allows you" -> "lets the user"
"The default is zero, adding no delay." -> "The default is zero (no delay)."
~
4.
+ larger than the time deviations between servers. Note that
+ in the case when this parameter is set to a long value, the
+ replication may not continue if the replication slot falls behind the
+ current LSN by more than <literal>max_slot_wal_keep_size</literal>.
+ See more details in <xref linkend="guc-max-slot-wal-keep-size"/>.
+ </para>
4a.
SUGGESTION
Note that if this parameter is set to a long delay, the replication
will stop if the replication slot falls behind the current LSN by more
than <literal>max_slot_wal_keep_size</literal>.
~
4b.
When it is rendered (like below) it looks a bit repetitive:
... if the replication slot falls behind the current LSN by more than
max_slot_wal_keep_size. See more details in max_slot_wal_keep_size.
~
IMO the previous sentence should include the link.
SUGGESTION
if the replication slot falls behind the current LSN by more than
<link linkend =
"guc-max-slot-wal-keep-size"><literal>max_slot_wal_keep_size</literal></link>.
~
5.
+ <para>
+ Synchronous replication is affected by this setting when
+ <varname>synchronous_commit</varname> is set to
+ <literal>remote_write</literal>; every <literal>COMMIT</literal>
+ will need to wait to be applied.
+ </para>
Yes, this deserves a big warning -- but I am just not quite sure of
the details. I think this impacts more than just "remote_rewrite" --
e.g. the same problem would happen if "synchronous_standby_names" is
non-empty.
I think this warning needs to be more generic to cover everything.
Maybe something like below
SUGGESTION:
Delaying the replication can mean there is a much longer time between
making a change on the publisher, and that change being committed on
the subscriber. This can have a big impact on synchronous replication.
See https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT
======
src/backend/commands/subscriptioncmds.c
6. parse_subscription_options
+ ms = interval_to_ms(interval);
+ if (ms < 0 || ms > INT_MAX)
+ ereport(ERROR,
+ errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("%lld ms is outside the valid range for option \"%s\"",
+ (long long) ms, "min_apply_delay"));
"for option" -> "for parameter"
======
src/backend/replication/logical/worker.c
7. apply_delay
+static void
+apply_delay(TimestampTz ts)
IMO having a delay is not the usual case. So, would a better name for
this function be 'maybe_delay'?
~
8.
+ * high value for the delay. This design is different from the physical
+ * replication (that applies the delay at commit time) mainly because write
+ * operations may allow some issues (such as bloat and locks) that can be
+ * minimized if it does not keep the transaction open for such a long time.
Something seems not quite right with this wording -- is there a better
way of describing this?
~
9.
+ /*
+ * Delay apply until all tablesync workers have reached READY state. If we
+ * allow the delay during the catchup phase, once we reach the limit of
+ * tablesync workers, it will impose a delay for each subsequent worker.
+ * It means it will take a long time to finish the initial table
+ * synchronization.
+ */
+ if (!AllTablesyncsReady())
+ return;
"Delay apply until..." -> "The min_apply_delay parameter is ignored until..."
~
10.
+ /*
+ * The worker may be waken because of the ALTER SUBSCRIPTION ...
+ * DISABLE, so the catalog pg_subscription should be read again.
+ */
+ if (!in_remote_transaction && !in_streamed_transaction)
+ {
+ AcceptInvalidationMessages();
+ maybe_reread_subscription();
+ }
+ }
"waken" -> "woken"
======
src/bin/psql/describe.c
11. describeSubscriptions
+ /* Origin and min_apply_delay are only supported in v16 and higher */
if (pset.sversion >= 160000)
appendPQExpBuffer(&buf,
- ", suborigin AS \"%s\"\n",
- gettext_noop("Origin"));
+ ", suborigin AS \"%s\"\n"
+ ", subapplydelay AS \"%s\"\n",
+ gettext_noop("Origin"),
+ gettext_noop("Apply delay"));
IIUC the psql command is supposed to display useful information to the
user, so I wondered if it is worthwhile to put the units in this
column header -- "Apply delay (ms)" instead of just "Apply delay"
because that would make it far easier to understand the meaning
without having to check the documentation to discover the units.
======
src/include/utils/timestamp.h
12.
+extern int64 interval_to_ms(const Interval *interval);
+
For consistency with the other interval conversion functions exposed
here maybe this one should have been called 'interval2ms'
======
src/test/subscription/t/032_apply_delay.pl
13.
IIUC this test is checking if a delay has occurred by inspecting the
debug logs to see if a certain code path including "logical
replication apply delay" is logged. I guess that is OK, but another
way might be to compare the actual timing values of the published and
replicated rows.
The publisher table can have a column with default now() and the
subscriber side table can have an *additional* column also with
default now(). After replication, those two timestamp values can be
compared to check if the difference exceeds the min_time_delay
parameter specified.
------
Kind Regards,
Peter Smith.
Fujitsu Australia
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2022-12-06 08:01:22 | Re: Generate pg_stat_get_* functions with Macros |
Previous Message | Amit Kapila | 2022-12-06 07:50:24 | Re: Perform streaming logical transactions by background workers and parallel apply |