Re: Logical Decoding follows timelines

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical Decoding follows timelines
Date: 2015-01-03 10:07:29
Message-ID: 54A7BF61.9080708@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/17/2014 10:35 AM, Simon Riggs wrote:
> On 16 December 2014 at 21:17, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
>>>> This patch is a WIP version of doing that, but only currently attempts
>
>>> With the patch, XLogSendLogical uses the same logic to calculate SendRqstPtr
>>> that XLogSendPhysical does. It would be good to refactor that into a common
>>> function, rather than copy-paste.
>>
>> Some of the logic is similar, but not all.
>>
>>> SendRqstPtr isn't actually used for anything in XLogSendLogical.
>>
>> It exists to allow the call which resets TLI.
>>
>> I'll see if I can make it exactly identical; I didn't think so when I
>> first looked, will look again.
>
> Yes, that works. New version attached

Some comments, mostly on readability (not all of these were this patch's
fault):

> /*
> * Check that the timeline the client requested for exists, and
> * the requested start location is on that timeline.
> */
> (void) ReadSendTimeLine(cmd->timeline);
>
> /*
> * Found the requested timeline in the history. Check that
> * requested startpoint is on that timeline in our history.
> *
> * This is quite loose on purpose. We only check that we didn't
> * fork off the requested timeline before the switchpoint. We
> * don't check that we switched *to* it before the requested
> * starting point. This is because the client can legitimately
> * request to start replication from the beginning of the WAL
> * segment that contains switchpoint, but on the new timeline, so
> * that it doesn't end up with a partial segment. If you ask for a
> * too old starting point, you'll get an error later when we fail
> * to find the requested WAL segment in pg_xlog.
> *
> * XXX: we could be more strict here and only allow a startpoint
> * that's older than the switchpoint, if it's still in the same
> * WAL segment.
> */

The first comment implies that the ReadSendTimeLine call checks that the
requested start location is on the timeline, but that's actually done by
the code that follows the second comment. I would merge these two
comments, and move the ReadSendTimeLine call below the merged comment.

> @@ -577,8 +571,8 @@ StartReplication(StartReplicationCmd *cmd)
> * that's older than the switchpoint, if it's still in the same
> * WAL segment.
> */
> - if (!XLogRecPtrIsInvalid(switchpoint) &&
> - switchpoint < cmd->startpoint)
> + if (!XLogRecPtrIsInvalid(sendTimeLineValidUpto) &&
> + sendTimeLineValidUpto < cmd->startpoint)
> {
> ereport(ERROR,
> (errmsg("requested starting point %X/%X on timeline %u is not in this server's history",

IMHO using the local 'switchpoint' variable was more clear.

> @@ -941,6 +936,8 @@ StartLogicalReplication(StartReplicationCmd *cmd)
> * Force a disconnect, so that the decoding code doesn't need to care
> * about an eventual switch from running in recovery, to running in a
> * normal environment. Client code is expected to handle reconnects.
> + * This covers the race condition where we are promoted half way
> + * through starting up.
> */
> if (am_cascading_walsender && !RecoveryInProgress())
> {

We could exit recovery immediately after this check. Why is this check
needed?

> /*
> + * Find the timeline for the start location, or throw an error.
> + *
> + * Logical replication relies upon replication slots. Each slot has a
> + * single timeline history baked into it, so this should be easy.
> + */

I don't understand what "baked in" means here.

> + /*
> + * Get the SendRqstPtr and follow any timeline changes.
> + */
> + SendRqstPtr = GetLatestRequestPtr();

The old comment used to say "Figure out how far we can safely send the
WAL". I think that was much more clear. It's not clear what following
timeline changes means here, and the fact that it "gets the SendRqstPtr"
is obvious from the code.

> +
> +static XLogRecPtr
> +GetLatestRequestPtr(void)

This function desperately needs comment to explain what it does. I don't
much like its name either.

> +static TimeLineID
> +ReadSendTimeLine(TimeLineID tli)

Ditto. This function is also missing a "return".

I think it would slightly more intuitive if this function didn't set the
global variables directly, but simply returned the returned values to
the caller.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2015-01-03 10:41:49 Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)
Previous Message Peter Geoghegan 2015-01-03 10:00:52 Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)