Re: Logical replication - initial data synchronization

From: Koen De Groote <kdg(dot)dev(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-docs(at)lists(dot)postgresql(dot)org
Subject: Re: Logical replication - initial data synchronization
Date: 2024-10-17 08:59:51
Message-ID: CAGbX52HcDV7S5tEbsQEDWAJkfMBrm7OYaCmn_bt5shtm_Td-YQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs

Hello Bruce, thanks for picking this up.

Personally, I would make explicit mention of the fact that creating the
snapshot and copying the data is taken care of by Postgres itself. Those
are the points that had me confused early on, wondering if I had to perform
the copy once the snapshot was ready.

Having used LR for months now, that seems weird as I write it, but I
remember it being part of my initial confusion.

Instead of:
" Internally logical replication of a table starts by taking a snapshot
of the data on the publisher database and copying that to the
subscriber."

I would say:
"When logical replication is started for a table, Postgres internally
takes a snapshot of the table data on the publisher database,
and then copies that data to the subscriber."

Also, I would change:

"Once complete, the changes on the publisher are sent to the subscriber"

To:

"Once complete, any changes on the publisher since the initial copy are
sent to the subscriber"

This is more explicit and clear, I feel.

And then to be consistent I'd also use this wording in the last change,
changing:

"publisher database. Once complete, changes on the publisher are sent"

to

"publisher database. Once complete, any changes on the publisher since the
initial copy are sent"

Hope that's ok.

Thanks for looking into this.

Regards,
Koen De Groote

On Thu, Oct 17, 2024 at 3:20 AM Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> On Sat, May 18, 2024 at 09:02:11PM +0000, PG Doc comments form wrote:
> > The following documentation comment has been logged on the website:
> >
> > Page:
> https://www.postgresql.org/docs/16/logical-replication-subscription.html
> > Description:
> >
> > I'm reading up on Logical Replication and have been reading the pages in
> > order.
> >
> > The first 2 pages:
> > https://www.postgresql.org/docs/current/logical-replication.html and
> >
> https://www.postgresql.org/docs/current/logical-replication-publication.html
> > both speak of the requirement to set up a snapshot and explain that
> > publication will then send further updates as they happen to subscribers.
> >
> > But the 3rd page,
> >
> https://www.postgresql.org/docs/current/logical-replication-subscription.html
> > now mentions this: "Additional replication slots may be required for the
> > initial data synchronization of pre-existing table data and those will be
> > dropped at the end of data synchronization."
> >
> > For me, reading the first 2 pages implied that I would have to perform
> some
> > manual command that starts the creation of a snapshot of pre-existing
> table
> > data, and unpack this on the subscriber node somehow.
> >
> > The text on the "Subscription" page sounds to me like this is actually
> > something the publisher<-> subscriber model of the postgres software can
> > manage on its own. As opposed to a snapshot, which feels more like the
> > concept of a basebackup.
> >
> > Regardless of that being correct or not, my current impression is that
> the
> > description isn't consistent across pages. Maybe the text is obvious for
> > people who've performed setup of logical replication before, but I have
> > never done this. To me, the description on the first 2 pages seems
> > inconsistent with the description I just encountered on the 3rd page. I
> was
> > under the impression there was no such thing as "initial data
> > synchronization of pre-existing table data" in terms of postgres doing
> this
> > by itself.
> >
> > Am I missing something extremely simple, or can the description of the
> > involved operations be made more consistent across documentation pages?
>
> Is the attached patch an improvement?
>
> --
> Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
> EDB https://enterprisedb.com
>
> When a patient asks the doctor, "Am I going to die?", he means
> "Am I going to die soon?"
>

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message PG Doc comments form 2024-10-17 09:26:04 pg_prewarm can handle indexes
Previous Message Daniel Gustafsson 2024-10-17 07:20:11 Re: A minor bug in doc. Hovering over heading shows # besides it.