Re: replication_origin and replication_origin_lsn usage on subscriber

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, movead(dot)li(at)highgo(dot)ca, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: replication_origin and replication_origin_lsn usage on subscriber
Date: 2020-07-14 05:30:14
Message-ID: CAFiTN-sS+sR5fKfHhmc_3QPHK+9Wp5D79OeZpuWeFCLb+ApS_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 9, 2020 at 6:55 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Jul 9, 2020 at 6:14 PM Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> >
> > Hi,
> >
> > On 09/07/2020 14:34, Amit Kapila wrote:
> > > On Thu, Jul 9, 2020 at 5:16 PM Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> > >>
> > >> On 09/07/2020 13:10, Amit Kapila wrote:
> > >>> On Thu, Feb 6, 2020 at 2:40 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >>>>
> > >>>> During logical decoding, we send replication_origin and
> > >>>> replication_origin_lsn when we decode commit. In pgoutput_begin_txn,
> > >>>> we send values for these two but never used on the subscriber side.
> > >>>> Though we have provided a function (logicalrep_read_origin) to read
> > >>>> these two values but that is not used in code anywhere.
> > >>>>
> > >>
> > >> We don't use the origin message anywhere really because we don't support
> > >> origin forwarding in the built-in replication yet. That part I left out
> > >> intentionally in the original PG10 patchset as it's mostly useful for
> > >> circular replication detection when you want to replicate both ways.
> > >> However that's relatively useless without also having some kind of
> > >> conflict detection which would be another huge pile of code and I
> > >> expected we would end up not getting logical replication in PG10 at all
> > >> if I tried to push conflict detection as well :)
> > >>
> > >
> > > Fair enough. However, without tests and more documentation about this
> > > concept, it is likely that future development might break it. It is
> > > good that you and others who know this part well are there to respond
> > > but still, the more documentation and tests would be preferred.
> > >
> >
> > Honestly that part didn't even need to be committed given it's unused.
> > Protocol supports versioning so it could have been added at later time.
> >
> > >>>
> > >>> For the purpose of decoding in-progress transactions, I think we can
> > >>> send replication_origin in the first 'start' message as it is present
> > >>> with each WAL record, however replication_origin_lsn is only logged at
> > >>> commit time, so can't send it before commit. The
> > >>> replication_origin_lsn is set by pg_replication_origin_xact_setup()
> > >>> but it is not clear how and when that function can be used. Do we
> > >>> really need replication_origin_lsn before we decode the commit record?
> > >>>
> > >>
> > >> That's the SQL interface, C interface does not require that and I don't
> > >> think we need to do that.
> > >>
> > >
> > > I think when you are saying SQL interface, you referred to
> > > pg_replication_origin_xact_setup() but I am not sure which C interface
> > > you are referring to in the above sentence?
> > >
> >
> > All the stuff pg_replication_origin_xact_setup does internally.
> >
> > >> The existing apply code sets the
> > >> replorigin_session_origin_lsn only when processing commit message IIRC.
> > >>
> > >
> > > That's correct. However, we do send it via 'begin' callback which
> > > won't be possible with the streaming of in-progress transactions. Do
> > > we need to send this origin related information (origin, origin_lsn)
> > > while streaming of in-progress transactions? If so, when? As far as
> > > I can see, the origin_id can be sent with the first 'start' message.
> > > The origin_lsn and origin_commit can be sent with the last 'start' of
> > > streaming commit if we want but not sure if that is of use. If we
> > > need to send origin_lsn earlier than that then we need to record it
> > > with other WAL records (other than Commit WAL record).
> > >
> >
> > If we were to support the origin forwarding, then strictly speaking we
> > need everything only at commit time from correctness perspective,
> >
>
> Okay. Anyway streaming mode is optional, so in such cases, we can keep it 'off'
>
> > but
> > ideally origin_id would be best sent with first message as it can be
> > used to filter out changes at decoding stage rather than while we
> > process the commit so having it set early improves performance of decoding.
> >
>
> Yeah, makes sense. So, we will just send origin_id (with first
> streaming start message) and leave others.

So IIUC, currently we are sending the latest origin_id which is set
during the commit time. So in our case, while we start streaming we
will send the origin_id of the latest change in the current stream
right? I think we will always have to remember the latest origin id
in top-level ReorderBufferTXN as well.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-07-14 05:31:04 Re: TAP tests and symlinks on Windows
Previous Message Pavel Stehule 2020-07-14 05:25:56 Re: DROP relation IF EXISTS Docs and Tests - Bug Fix