From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Causal reads take II |
Date: | 2017-06-24 04:05:21 |
Message-ID: | CAEepm=1k0VyP8s1yA56_VBmfoXFrsfFHjFOtQjVO_MbxDukyLA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jun 23, 2017 at 11:48 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> Apply the patch after first applying a small bug fix for replication
> lag tracking[4]. Then:
That bug fix was committed, so now causal-reads-v17.patch can be
applied directly on top of master.
> 1. Set up some streaming replicas.
> 2. Stick causal_reads_max_replay_lag = 2s (or any time you like) in
> the primary's postgresql.conf.
> 3. Set causal_reads = on in some transactions on various nodes.
> 4. Try to break it!
Someone asked me off-list how to set this up quickly and easily for
testing. Here is a shell script that will start up a primary server
(port 5432) and 3 replicas (ports 5441 to 5443). Set the two paths at
the top of the file before running in. Log in with psql postgres [-p
<port>], then SET causal_reads = on to test its effect.
causal_reads_max_replay_lay is set to 2s and depending on your
hardware you might find that stuff like CREATE TABLE big_table AS
SELECT generate_series(1, 10000000) or a large COPY data load causes
replicas to be kicked out of the set after a while; you can also pause
replay on the replicas with SELECT pg_wal_replay_pause() and
pg_wal_replay_resume(), kill -STOP/-CONT or -9 the walreceiver
processes to similar various failure modes, or run the replicas
remotely and unplug the network. SELECT application_name, replay_lag,
causal_reads_state FROM pg_state_replication to see the current
situation, and also monitor the primary's LOG messages about
transitions. You should find that the
"read-your-writes-or-fail-explicitly" guarantee is upheld, no matter
what you do, and furthermore than failing or lagging replicas don't
hold hold the primary up very long: in the worst case
causal_reads_lease_time for lost contact, and in the best case the
time to exchange a couple of messages with the standby to tell it its
lease is revoked and it should start raising an error. You might find
test-causal-reads.c[1] useful for testing.
> Maybe it needs a better name.
Ok, how about this: the feature could be called "synchronous replay".
The new column in pg_stat_replication could be called sync_replay
(like the other sync_XXX columns). The GUCs could be called
synchronous replay, synchronous_replay_max_lag and
synchronous_replay_lease_time. The language in log messages could
refer to standbys "joining the synchronous replay set".
Restating the purpose of the feature with that terminology: If
synchronous_replay is set to on, then you see the effects of all
synchronous_replay = on transactions that committed before your
transaction began, or an error is raised if that is not possible on
the current node. This allows applications to direct read-only
queries to read-only replicas for load balancing without seeing stale
data. Is that clearer?
Restating the relationship with synchronous replication with that
terminology: while synchronous_commit and synchronous_standby_names
are concerned with distributed durability, synchronous_replay is
concerned with distributed visibility. While the former prevents
commits from returning if the configured level of durability isn't met
(for example "must be flushed on master + any 2 standbys"), the latter
will simply drop any standbys from the synchronous replay set if they
fail or lag more than synchronous_replay_max_lag. It is reasonable to
want to use both features at once: my policy on distributed
durability might be that I want all transactions to be flushed to disk
on master + any of three servers before I report information to users,
and my policy on distributed visibility might be that I want to be
able to run read-only queries on any of my six read-only replicas, but
don't want to wait for any that lag by more than 1 second.
Thoughts?
--
Thomas Munro
http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
test-causal-reads.sh | application/x-sh | 1.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2017-06-24 04:26:42 | Re: FIPS mode? |
Previous Message | Curtis Ruck | 2017-06-24 03:56:09 | FIPS mode? |