From: | Craig Ringer <craig(at)2ndquadrant(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Replication origins and timelines |
Date: | 2017-06-01 01:12:04 |
Message-ID: | CAMsr+YHbU5oiQzVRhBRgwcQeXN5-pKQeovurNXSV8HHJyVxFow@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi all
TL;DR: replication origins track LSN without timeline. This is
ambiguous when physical failover is present since XXXXXXXX/XXXXXXXX
can now represent more than one state due to timeline forks with
promotions. Replication origins should track timelines so we can tell
the difference, I propose to patch them accordingly for pg11.
---------
When replication origins were introduced, they deliberately left out
tracking of the upstream node's timeline. Logical decoding couldn't
follow a timeline switch anyway, and replicas (still) have no facility
for logical decoding so everything completely breaks on promotion of a
physical replica.
I'm working on fixing that so that logical decoding and logical
replication integrates properly with physical replication and
failover. But when that works we'll face the same problem in logical
rep that timelines were introduced to solve for physical rep.
To prevent undetected misreplication we'll need to keep track of the
timeline of the last-replicated LSN in our downstream replication
origin. So I propose to add a timeline field to replication origins
for pg11.
Why?
Take master A, its physical replica B, and logical decoding client X
streaming changes from A. B is lagging. A is at lsn 1/1000, B is only
at 1/500. C has replicated from A up to 1/1000, when A fails. We
promote B to replace A. Now C connects to B, and requests to resume at
LSN 1/1000.
If B has since done enough work for its insert position to pass
1/1000, C will completely skip whatever B did between 1/500 and
1/1000, thinking (incorrectly) that it already replayed it. And it
will have *extra data* from A from the 1/500 to 1/1000 range that B
lost. It'll pick up from B's 1/1000 and try to apply that on top of
A's 1/1000 state, potentially leading to a mangled mess.
In physical rep this would lead to serious data corruption and
crashes. In logical rep it'll most likely lead to conflicts, apply
errors, inconsistent data, broken FKs, etc. It could be drastic, or
quite subtle, depending on app and workload.
But we really should still detect it. To do that, we need to remember
that our last replay position was (1/1000, 1) . And when we request to
start replay from 1/1000 at timeline 1 on B, it'll ERROR, telling us
that its timeline 1 ends at 1/500.
We could still *choose* to continue as if all was well, but by default
we'll detect the error.
But we can't do that unless replication origins on the downstream can
track the timeline.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2017-06-01 01:23:25 | Re: Replication origins and timelines |
Previous Message | Stephen Frost | 2017-06-01 01:04:24 | Re: Patch: Add --no-comments to skip COMMENTs with pg_dump |