Re: Proposal: "Causal reads" mode for load balancing reads without stale data

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: "Causal reads" mode for load balancing reads without stale data
Date: 2015-11-15 14:50:25
Message-ID: CA+TgmoZXruGn-A6ak8mfHnOEgNOGsC+Sxg_zGGwLn4-MLd4hng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 15, 2015 at 5:41 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Hmm, if that's where we're at, I'll summarize my thoughts.
>
> All of this discussion presupposes we are distributing/load balancing
> queries so that reads and writes might occur on different nodes.

Agreed. I think that's a pretty common pattern, though certainly not
the only one.

> We need a good balancer. Any discussion of this that ignores the balancer
> component is only talking about half the solution. What we need to do is
> decide whether functionality should live in the balancer or the core.

I'm all in favor of having a load-balancer in core, but that seems
completely unrelated to the patch at hand.

> Your option (1) is viable, but only in certain cases. We could add support
> for some token/wait mechanism but as you say, this would require application
> changes not pooler changes.

Agreed.

> Your option (2) is wider but also worse in some ways. It can be implemented
> in a pooler.
>
> Your option (3) doesn't excite me much. You've got a load of stuff that
> really should happen in a pooler. And at its core we have synchronous_commit
> = apply but with a timeout rather than a wait. So anyway, consider me nudged
> to finish my patch to provide capability for that by 1 Jan.

I don't see how either option (2) or option (3) could be implemented
in a pooler. How would that work?

To be frank, it's starting to seem to me like you are just trying to
block this patch so you can have time to develop your own version
instead. I hope that's not the case, because it would be quite
unfair. When Thomas originally posted the patch, you complained that
"This causes every writer to wait. What we want is to isolate the wait
only to people performing a write-read sequence, so I think it should
be readers that wait. Let's have that debate up front before we start
reviewing the patch." Now, you seem to be saying that's OK, because
you want to post a patch to do exactly the same thing under the name
synchronous_commit=apply, but you want it to be your own patch,
leaving out the other stuff that Thomas has put into this one.

That could be the right thing to do, but how about we discuss it a
bit? The timeout stuff that Thomas has added here is really useful.
Without that, if a machine goes down, we wait forever. That's the
right thing to do if we're replicating to make sure transactions can
never be lost, but it's a bad idea if we're replicating for load
balancing. In the load balancing case, you want to drop sync slaves
quickly to ensure the cluster remains available, and you need them to
know they are out of sync so that the load balancer doesn't get
confused. That's exactly what is implemented here.

If you have an idea for a simpler implementation, great, but I think
we need something. I don't see how it's going to work to make it
entirely the pooler's job to figure out whether the cluster is in sync
- it needs a push from the core server. Here, that push is easy to
find: if a particular replica starts returning the "i'm out of sync"
error when you query it, then stop routing queries to that replica
until the error clears (which the pooler can find out by polling it
with some trivial query). That's a great deal more useful than
synchronous_commit=apply without such a feature.

> On a related note, any further things like "GUC causal_reads_standby_names"
> should be implemented by Node Registry as a named group of nodes. We can
> have as many arbitrary groups of nodes as we want. If that sounds strange
> look back at exactly why GUCs are called GUCs.

I think a node registry is a good idea, and my impression from the
session in Vienna is quite a few other hackers do, too. But I also
don't think it's remotely reasonable to make that a precondition for
accepting this patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2015-11-15 14:53:10 Re: proposal: multiple psql option -c
Previous Message Catalin Iacob 2015-11-15 13:50:06 Re: proposal: multiple psql option -c