Re: PG DOCS - logical replication filtering

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Euler Taveira <euler(at)eulerto(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PG DOCS - logical replication filtering
Date: 2022-04-12 08:30:40
Message-ID: CAHut+PsBr1DtSPKfdU42gB_ufooKFk7W28fC-co+ThB2JkSghw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 12, 2022 at 3:33 AM Euler Taveira <euler(at)eulerto(dot)com> wrote:
>
> On Mon, Apr 11, 2022, at 7:40 AM, Amit Kapila wrote:
>
> On Mon, Apr 11, 2022 at 12:39 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > On Fri, Apr 8, 2022 at 4:12 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > OK. Added in v7 [1]
> >
>
> Thanks, this looks mostly good to me. I didn't like the new section
> added for partitioned tables examples, so I removed it and added some
> explanation of the tests. I have slightly changed a few other lines. I
> am planning to commit the attached tomorrow unless there are more
> comments.
>
> I have a few comments.

Thanks for your review comments! I addressed most of them as suggested
- see the details below.

>
> > If a publication publishes UPDATE and/or DELETE operations ...
> >
>
> If we are talking about operations, use lowercase like I suggested in the
> previous version. See the publish parameter [1]. If we are talking about
> commands, the word "operations" should be removed or replaced by "commands".
> The "and/or" isn't required, "or" implies the same. If you prefer "operations",
> my suggestion is to adjust the last sentence that says "only INSERT" to "only
> <italic>insert</italic> operation".

Not changed. Because in fact, I copied most of this sentence
(including the uppercase, "operations", "and/or") from existing
documentation [1]
e.g. see "The tables added to a publication that publishes UPDATE
and/or DELETE operations must ..."
[1] https://www.postgresql.org/docs/current/sql-createpublication.html

>
> > If the old row satisfies the row filter expression (it was sent to the
> > subscriber) but the new row doesn't, then from a data consistency perspective
> > the old row should be removed from the subscriber.
> >
>
> I suggested small additions to this sentence. We should at least add a comma
> after "then" and "perspective". The same applies to the next paragraph too.

Modified the commas in [v9] as suggested.

>
> Regarding the "Summary", it is redundant as I said before. We already described
> all 4 cases. I vote to remove it. However, if we would go with a table, I
> suggest to improve the formatting: add borders, "old row" and "new row" should
> be titles, "no match" and "match" should be represented by symbols ("" and "X",
> for example), and "Case X" column should be removed because this extra column
> adds nothing.

Yeah, I know the information is the same in the summary and in the
text. Personally, I find big slabs of technical text difficult to
digest, so I'd have to spend 5 minutes of careful reading and drawing
the exact same summary on a piece of paper anyway just to visualize
what the text says. The summary makes it easy to understand at a
glance. But I have modified the summary in [v9] to remove the "case"
and to add other column headings as suggested.

>
> Regarding the "Partitioned Tables", I suggested to remove the bullets. We
> generally have paragraphs with a few sentences. I tend to avoid short
> paragraphs. I also didn't like the 2 bullets using almost the same words. In
> the previous version, I suggested something like
>
> If the publication contains a partitioned table, the parameter
> publish_via_partition_root determines which row filter expression is used. If
> the parameter publish_via_partition_root is true, the row filter expression
> associated with the partitioned table is used. Otherwise, the row filter
> expression associated with the individual partition is used.
>

Modified in [v9] to remove the bullets.

> > will be copied. (see Section 31.3.6 for details).
>
> There is an extra period after "copied" that should be removed. The other
> option is to remove the parentheses and have another sentence for "See ...".
>

Modified in [v9] as suggested.

> > those expressions get OR'ed together
>
> I prefer plain English here. This part of the sentence is also redundant with
> the rest of the sentence so I suggested to remove it in the previous version.
>
> rows that satisfy any of the row filter expressions is replicated.
>
> instead of
>
> those expressions get OR'ed together, so that rows satisfying any of the
> expressions will be replicated.

Not changed. The readers of this docs page are all users who will be
familiar with the filter expressions, so I felt the "OR'ed together"
part will be perfectly clear to the intended audience.

>
> I also didn't use a different paragraph (like I suggested in the previous
> version) because we are talking about the same thing.
>

Modified in [v9] to use a single paragraph.

> The bullets in the example sounds strange, that's why I suggested removing it.
> We can even combine the 3 sentences into one paragraph.

Modified [v9] so the whole example now has no bullets. Also combined
all these 3 sentences as suggested.

>
> > The PSQL command \dRp+ shows the row filter expressions (if defined) for each
> > table of the publications.
>
> Well, we don't use PSQL (upppercase) in the documentation. I suggested a
> different sentence:
>
> The psql shows the row filter expressions (if defined) for each table.
>

Modified the sentence in [v9]. Now it uses lowercase psql.

> > The PSQL command \d shows what publications the table is a member of, as well
> > as that table's row filter expression (if defined) in those publications.
>
> It is not logical replication business to explain about psql behavior. If, for
> some reason, someone decided to change it, this section will contain obsolete
> information. The psql output is fine, the explanation is not. That's why I
> suggested this modification.

Modified [v9] this sentence also to use lowercase psql.

But I did not understand your point about “If, for some reason,
someone decided to change it, this section will contain obsolete
information”, because IIUC that will be equally true for both the
explanation and the output, so I did not understand why you say "psql
output is fine, the explanation is not". It is the business of this
documentation to help the user to know how and where they can find the
row filter information they may need to know.

>
> > Only the rows satisfying the t1 WHERE clause of publication p1 are
> > replicated.
>
> Again, no bullets. This sentence is useful *before* the psql output. We're
> presenting the results. Let's follow the pattern, describe the action and show
> the results.
>

OK. Modified all the [v9] example now has all the bullets removed and
follows the suggested pattern (e.g. where the notes always come
*before* the results)

> > The UPDATE replicates the change as normal.
>
> This sentence should be *before* the psql output (see my previous version).
>

Modified [v9] as suggested.

> Regarding the new examples (for partitioned tables), shouldn't we move the
> parent / child definitions to the beginning of the Examples section?

Not changed. IMO if we moved those CREATE TABLE as suggested then they
will then be too far away from where they are being used.

> It seems
> confusing use the same code snippet to show repeated table definitions
> (publisher and subscriber). I checked fast and after a few seconds I realized
> that the example is not wrong but the database name has a small difference (one
> letter "s" x "p"). The publication and subscription definitions are fine there.
>

Not changed. The publisher and subscriber programlistings are always
separated. If you are looking at the rendered HTML I think it is quite
clear that one is at the publisher and one is at the subscriber. OTOH,
if we omitted creating the tables on the subscriber then I think that
really would cause some confusion.

> I think reusing the same tables and publication introduces complexity.
> Shouldn't we just use different tables and publication to provide an "easy"
> example? It would avoid DROP PUBLICATION, ALTER SUBSCRIPTION and TRUNCATE.
>

Not changed. Those same tables were re-used *deliberately* so that the
examples could use identical inserts, and to emphasize that the
different behaviour was caused only by the
"publish_via_partition_root" setting.

> > Do the inserts same as before.
>
> We should indicate the node (publisher) to be clear.

OK. Modified [v9] as suggested.

------
[v9] https://www.postgresql.org/message-id/CAHut%2BPvYqo77rwg_vHC%3DOyQ7hCHZGVm%3DNi%2BJQbf8VyBz8hoo2w%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dong Wook Lee 2022-04-12 08:48:40 GSoC: Improve PostgreSQL Regression Test Coverage
Previous Message Fabien COELHO 2022-04-12 08:19:07 Re: random() function documentation