Re: PGDOCS - Logical replication GUCs - added some xrefs

From: samay sharma <smilingsamay(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PGDOCS - Logical replication GUCs - added some xrefs
Date: 2022-12-07 23:49:19
Message-ID: CAJxrbywJoYWe9FL7NjLhKm4ZGRytm-63r2n05NH2bz=oJL472g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Dec 6, 2022 at 11:12 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:

> On Tue, Dec 6, 2022 at 5:57 AM samay sharma <smilingsamay(at)gmail(dot)com>
> wrote:
> >
> > Hi,
> >
> > On Mon, Oct 24, 2022 at 12:45 AM Peter Smith <smithpb2250(at)gmail(dot)com>
> wrote:
> >>
> >> Hi hackers.
> >>
> >> There is a docs Logical Replication section "31.10 Configuration
> >> Settings" [1] which describes some logical replication GUCs, and
> >> details on how they interact with each other and how to take that into
> >> account when setting their values.
> >>
> >> There is another docs Server Configuration section "20.6 Replication"
> >> [2] which lists the replication-related GUC parameters, and what they
> >> are for.
> >>
> >> Currently AFAIK those two pages are unconnected, but I felt it might
> >> be helpful if some of the parameters in the list [2] had xref links to
> >> the additional logical replication configuration information [1]. PSA
> >> a patch to do that.
> >
> >
> > +1 on the patch. Some feedback on v5 below.
> >
>
> Thanks for your detailed review comments!
>
> I have changed most things according to your suggestions. Please check
> patch v6.
>

Thanks for the changes. See a few points of feedback below.

> + <para>
> + For <emphasis>logical replication</emphasis>,
<firstterm>publishers</firstterm>
> + (servers that do <link
linkend="sql-createpublication"><command>CREATE
PUBLICATION</command></link>)
> + replicate data to <firstterm>subscribers</firstterm>
> + (servers that do <link
linkend="sql-createsubscription"><command>CREATE
SUBSCRIPTION</command></link>).
> + Servers can also be publishers and subscribers at the same time.
Note,
> + the following sections refers to publishers as "senders". The
parameter
> + <literal>max_replication_slots</literal> has a different meaning
for the
> + publisher and subscriber, but all other parameters are relevant
only to
> + one side of the replication. For more details about logical
replication
> + configuration settings refer to
> + <xref linkend="logical-replication-config"/>.
> + </para>

The second last line seems a bit odd here. In my last round of feedback, I
had meant to add the line "The parameter .... " onwards to the top of
logical-replication-config.sgml.

What if we made the top of logical-replication-config.sgml like below?

Logical replication requires several configuration options to be set. Most
configuration options are relevant only on one side of the replication
(i.e. publisher or subscriber). However, max_replication_slots is
applicable on both sides but has different meanings on each side.

>
> > > + <para>
> > > + For <firstterm>logical replication</firstterm> configuration
> settings refer
> > > + also to <xref linkend="logical-replication-config"/>.
> > > + </para>
> > > +
> >
> > I feel the top paragraph needs to explain terminology for logical
> replication like it does for physical replication in addition to linking to
> the logical replication config page. I'm recommending this as we use terms
> like subscriber etc. in description of parameters without introducing them
> first.
> >
> > As an example, something like below might work.
> >
> > These settings control the behavior of the built-in streaming
> replication feature (see Section 27.2.5) and logical replication (link).
> >
> > For physical replication, servers will be either a primary or a standby
> server. Primaries can send data, while standbys are always receivers of
> replicated data. When cascading replication (see Section 27.2.7) is used,
> standby servers can also be senders, as well as receivers. Parameters are
> mainly for sending and standby servers, though some parameters have meaning
> only on the primary server. Settings may vary across the cluster without
> problems if that is required.
> >
> > For logical replication, servers will either be publishers (also called
> senders in the sections below) or subscribers. Publishers are ....,
> Subscribers are...
> >
>
> OK. I split this blurb into 2 parts – streaming and logical
> replication. The streaming replication part is the same as before. The
> logical replication part is new.
>
> > > + <para>
> > > + See <xref linkend="logical-replication-config"/> for more
> details
> > > + about setting <varname>max_replication_slots</varname> for
> logical
> > > + replication.
> > > + </para>
> >
> >
> > The link doesn't add any new information regarding max_replication_slots
> other than "to reserve some for table sync" and has a good amount of
> unrelated info. I think it might be useful to just put a line here asking
> to reserve some for table sync instead of linking to the entire logical
> replication config section.
> >
>
> OK. I copied the tablesync note back to config.sgml definition of
> 'max_replication_slots' and removed the link as suggested. Frankly, I
> also thought it is a bit strange that the max_replication_slots in the
> “Sending Servers” section was describing this parameter for
> “Subscribers”. OTOH, I did not want to split the definition in half so
> instead, I’ve added another Subscriber <varlistentry> that just refers
> back to this place. It looks like an improvement to me.
>

Hmm, I agree this is a tricky scenario. However, to me, it seems odd to
mention the parameter twice as this chapter of the docs just lists each
parameter and describes them. So, I'd probably remove the reference to it
in the subscriber section. We should describe it's usage in different
places in the logical replication part of the docs (as we do).

>
> > > - Logical replication requires several configuration options to be
> set.
> > > + Logical replication requires several configuration parameters to
> be set.
> >
> > May not be needed? The docs have references to both options and
> parameters but I don't feel strongly about it. Feel free to use what you
> prefer.
>
> OK. I removed this.
>
> >
> > I think we should add an additional line to the intro here saying that
> parameters are mostly relevant only one of the subscriber or publisher.
> Maybe a better written version of "While max_replication_slots means
> different things on the publisher and subscriber, all other parameters are
> relevant only on either the publisher or the subscriber."
> >
>
> OK. Done but with slightly different wording to that.
>
> > > + <sect2 id="logical-replication-config-notes">
> > > + <title>Notes</title>
> >
> > I don't think we need this sub-section. If I understand correctly, these
> parameters are effective only on the subscriber side. So, any reason to not
> include them in that section?
>
> OK. I moved these notes into the "Subscribers" section as suggested,
> and removed "Notes".
>
> >
> > > +
> > > + <para>
> > > + Logical replication workers are also affected by
> > > + <link
> linkend="guc-wal-receiver-timeout"><varname>wal_receiver_timeout</varname></link>,
> > > + <link
> linkend="guc-wal-receiver-status-interval"><varname>wal_receiver_status_interval</varname></link>
> and
> > > + <link
> linkend="guc-wal-retrieve-retry-interval"><varname>wal_receiver_retry_interval</varname></link>.
> > > + </para>
> > > +
> >
> > I like moving this; it makes more sense here. Should we remove it from
> config.sgml? It seems a bit out of place there as we generally talk only
> about individual parameters there and this line is general logical
> replication subscriber advise which is more suited to
> logical-replication.sgml
>
> OK. I agree, it looked repetitive since the link to the
> logical-replication page is nearby this information anyway, so I’ve
> removed it from the config.sgml as you suggested.
>
> >
> > > + <para>
> > > + Configuration parameter
> > > + <link
> linkend="guc-max-worker-processes"><varname>max_worker_processes</varname></link>
> > > + may need to be adjusted to accommodate for replication workers,
> at least (
> > > + <link
> linkend="guc-max-logical-replication-workers"><varname>max_logical_replication_workers</varname></link>
> > > + + <literal>1</literal>). Some extensions and parallel queries
> also take
> > > + worker slots from <varname>max_worker_processes</varname>.
> > > + </para>
> > > +
> > > + </sect2>
> >
> > I think we should move this to the subscriber section as said above.
> It's useful to know this and people might skip over the notes.
>
> OK. Done.
>

> + <para>
> + <link
linkend="guc-max-logical-replication-workers"><varname>max_logical_replication_workers</varname></link>
> + must be set to at least the number of subscriptions (for apply
workers), plus
> + some reserve for the table synchronization workers. Configuration
parameter
> + <link
linkend="guc-max-worker-processes"><varname>max_worker_processes</varname></link>
> + may need to be adjusted to accommodate for replication workers, at
least (
> + <link
linkend="guc-max-logical-replication-workers"><varname>max_logical_replication_workers</varname></link>
> + + <literal>1</literal>). Note, some extensions and parallel queries
also
> + take worker slots from <varname>max_worker_processes</varname>.
> + </para>

Maybe do max_worker_processes in a new line like the rest.

Regards,
Samay
Microsoft

>
>
> ------
> Kind Regards,
> Peter Smith.
> Fujitsu Australia
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-12-07 23:52:41 Re: Error-safe user functions
Previous Message Andres Freund 2022-12-07 23:45:05 Re: [PATCH] pg_dump: lock tables in batches