sentPtr jumping back at the beginning of logical replication

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: sentPtr jumping back at the beginning of logical replication
Date: 2021-10-06 09:53:11
Message-ID: CAExHW5sRNPPVLB380YpRLLRHSmSssWeESQ3O=omWzM1P9sQkpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All,
sentPtr reported by WAL sender should usually never jump back, it
should always increase.
I observed a strange behaviour with the WAL sender where sentPtr jumps
back at the beginning. From code examination it looks like the
following behaviour is culprit.

The WAL sender reads WAL from restart_lsn which is what is set in
reader->EndRecPtr in XLogBeginRead. So reader->EndRecPtr starts with
restart_lsn

sentPtr starts with MyReplicationSlot->data.confirmed_flush in
StartLogicalReplication(). Usually there will be some or other
concurrent transaction happening, so confirmed_flush is higher than
restart_lsn. After the first loop over send_data in WalSndLoop(), it
gets set to reader->EndRecPtr. So when the first WAL record is read it
jumps back to the end of the first record starting at restart_lsn.
Eventually it will catch up to confirmed_lsn when the WAL sender reads
WAL.

This seems to be harmless but the logical receiver may get confused if
it receives an LSN lesser than confirmed_flush.

--
Best Wishes,
Ashutosh Bapat

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-10-06 11:01:33 Re: can we add subscription TAP test option "vcregress subscriptioncheck" for MSVC builds?
Previous Message Bharath Rupireddy 2021-10-06 09:36:56 Re: postgres_fdw: Obsolete comments in GetConnection()