From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Ants Aasma <ants(dot)aasma(at)eesti(dot)ee> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Proposal: "Causal reads" mode for load balancing reads without stale data |
Date: | 2015-11-12 00:49:30 |
Message-ID: | CAEepm=24grE9werAXQTtOFuO6YNKVFVqkOzmnntZfLNzdzqeBQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Nov 12, 2015 at 12:10 AM, Ants Aasma <ants(dot)aasma(at)eesti(dot)ee> wrote:
> On Wed, Nov 11, 2015 at 11:22 AM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> > On Wed, Nov 11, 2015 at 9:42 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
> wrote:
> >> On 11/11/2015 10:23 AM, Simon Riggs wrote:
> >>> Thanks for working on this issue.
> >>
> >> +1.
>
> +1. I have seen a lot of interest for something along these lines.
>
> >> I'm thinking the client should get some kind of a token back from the
> >> commit, and it could use the token on the standby, to wait for that
> commit
> >> to be applied. The token could be just the XID, or the LSN of the commit
> >> record. Or the application could generate the token and pass it to the
> >> server in the commit, similar to how 2PC works. So the interaction
> would be
> >> something like:
> >>
> >> In master:
> >> BEGIN;
> >> INSERT INTO FOO ...;
> >> COMMIT;
> >> Server returns: COMMITted with token 1234
> >>
> >> Later, in standby:
> >> BEGIN WAIT FOR COMMIT 1234 TO BE VISIBLE;
> >> SELECT * FROM foo;
> >> ...
>
> To avoid read anomalies (backwards timetravel) it should also be
> possible to receive a token from read-only transactions based on the
> latest snapshot used.
>
> > My thinking was that the reason for wanting to load balance over a set of
> > hot standbys is because you have a very read-heavy workload, so it makes
> > sense to tax the writers and leave the many dominant readers unburdened,
> so
> > (3) should be better than (2) for the majority of users who want such a
> > configuration. (Note also that it's not a requirement to tax every
> write;
> > with this proposal you can set causal_reads to off for those transactions
> > where you know there is no possibility of a causally dependent read).
> >
> > As for (1), my thinking was that most application developers would
> probably
> > prefer not to have to deal with that type of interface. For users who do
> > want to do that, it would be comparatively simple to make that possible,
> and
> > would not conflict with this proposal. This proposal could be used by
> > people retrofitting load balancing to an existing applications with
> relative
> > ease, or simply not wanting to have to deal with LSNs and complexity. (I
> > have considered proposing pg_wait_for_xlog_replay_location(lsn, timeout)
> > separately, which could be called on a standby with the lsn obtained from
> > pg_current_xlog_location() on the primary any time after a COMMIT
> completes,
> > but I was thinking of that as a different feature addressing a different
> > user base: people prepared to do more work to squeeze out some extra
> > performance.)
>
> Although I still think that 1) is the correct long term solution I
> must say that I agree with the reasoning presented. I think we should
> review the API in the light that in the future we might have a mix of
> clients, some clients that are able to keep track of causality tokens
> and either want to wait when a read request arrives, or pick a host to
> use based on the token, and then there are "dumb" clients that want to
> use write side waits.
>
Exactly!
I see the causality tokens approach (thank you for that terminology) not so
much as a "long term" solution, but rather as an expert feature likely to
interest a small number of sophisticated users willing to take on more
responsibility in exchange for greater control. We should definitely add
support for that, and I expect the patch would be fairly simple and short.
But I believe the vast majority of users would like to be able to run new
and existing plain SQL on any node and see the data they just wrote, with
graceful failure modes, and without extra conceptual load or invasive code
changes. So I think we should cater for that mode of usage that too.
Also, it should be possible to configure which standbys are considered
> for waiting on. Otherwise a reporting slave will occasionally catch up
> enough to be considered "available" and then cause a latency peak when
> a long query blocks apply again.
>
Good point. Here's a new version which adds the GUC
causal_reads_standby_names, defaulting to '*' (but as before, the feature
is not activated until you set causal_reads_timeout). Now you can list
standby names explicitly if you want a way to exclude certain standbys.
Also, I noticed that cascaded standbys shouldn't be available for causal
reads, so I added a check for that.
--
Thomas Munro
http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
causal-reads-v2.patch | application/octet-stream | 70.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2015-11-12 00:52:31 | Re: can we add SKIP LOCKED to UPDATE? |
Previous Message | Peter Geoghegan | 2015-11-12 00:44:02 | Re: WIP: Make timestamptz_out less slow. |