Replication origins and timelines

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Replication origins and timelines
Date: 2017-06-01 01:12:04
Message-ID: CAMsr+YHbU5oiQzVRhBRgwcQeXN5-pKQeovurNXSV8HHJyVxFow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all

TL;DR: replication origins track LSN without timeline. This is
ambiguous when physical failover is present since XXXXXXXX/XXXXXXXX
can now represent more than one state due to timeline forks with
promotions. Replication origins should track timelines so we can tell
the difference, I propose to patch them accordingly for pg11.

---------

When replication origins were introduced, they deliberately left out
tracking of the upstream node's timeline. Logical decoding couldn't
follow a timeline switch anyway, and replicas (still) have no facility
for logical decoding so everything completely breaks on promotion of a
physical replica.

I'm working on fixing that so that logical decoding and logical
replication integrates properly with physical replication and
failover. But when that works we'll face the same problem in logical
rep that timelines were introduced to solve for physical rep.

To prevent undetected misreplication we'll need to keep track of the
timeline of the last-replicated LSN in our downstream replication
origin. So I propose to add a timeline field to replication origins
for pg11.

Why?

Take master A, its physical replica B, and logical decoding client X
streaming changes from A. B is lagging. A is at lsn 1/1000, B is only
at 1/500. C has replicated from A up to 1/1000, when A fails. We
promote B to replace A. Now C connects to B, and requests to resume at
LSN 1/1000.

If B has since done enough work for its insert position to pass
1/1000, C will completely skip whatever B did between 1/500 and
1/1000, thinking (incorrectly) that it already replayed it. And it
will have *extra data* from A from the 1/500 to 1/1000 range that B
lost. It'll pick up from B's 1/1000 and try to apply that on top of
A's 1/1000 state, potentially leading to a mangled mess.

In physical rep this would lead to serious data corruption and
crashes. In logical rep it'll most likely lead to conflicts, apply
errors, inconsistent data, broken FKs, etc. It could be drastic, or
quite subtle, depending on app and workload.

But we really should still detect it. To do that, we need to remember
that our last replay position was (1/1000, 1) . And when we request to
start replay from 1/1000 at timeline 1 on B, it'll ERROR, telling us
that its timeline 1 ends at 1/500.

We could still *choose* to continue as if all was well, but by default
we'll detect the error.

But we can't do that unless replication origins on the downstream can
track the timeline.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-06-01 01:23:25 Re: Replication origins and timelines
Previous Message Stephen Frost 2017-06-01 01:04:24 Re: Patch: Add --no-comments to skip COMMENTs with pg_dump