Re: BDR global sequences in two machine failover

From: Giovanni Maruzzelli <gmaruzz(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: BDR global sequences in two machine failover
Date: 2015-09-07 12:56:37
Message-ID: CALXCt0oq7O4vzg_GDo=_F_Ex-7WEQhRkRSMg6cddERL5BQ0kbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sep 7, 2015 5:05 AM, "Craig Ringer" <craig(at)2ndquadrant(dot)com> wrote:
>
> On 7 September 2015 at 00:18, Giovanni Maruzzelli <gmaruzz(at)gmail(dot)com>
wrote:
> > Hello,
> >
> > Typical HA situation.
> >
> > I have master-master, two only machines, one active and one passive
> > (standby) with floating IP.
> > I write to only one machine at time, the one with the floating IP.
>
> This is a deployment that is better suited to the typical approach
> with an active node, a standby streaming replica, and failover. Tools
> like repmgr help with this.
>

Craig, thanks a lot for your answer!

My use case is keeping internal state of some load balanced servers that
need to act as one (eg cluster of voip servers).

Last update wins is ok.

If I do not use global sequences, and I use uuid as primary keys, would BDR
be a correct choice?

BDR is appealing not only because of new toy coolness, but also because of
possible geodistribution and the seemingly sheer simplicity of
installation/management.

Btw, congratulation for the feat!

-giovanni

> > When one machine is down I can no more refill sequence allocated chunk
(eg:
> > next pool of values)...
>
> Global sequence allocation requires a quorum of half the nodes plus
> one. So in a 2-node system that means both nodes.
>
> > How do you deal with this?
>
> Don't use a 2-node multi-master asynchronous replication system as an
> active/standby failover system.
>
> (BTW, newer BDR versions allow you to increase the preallocated chunk
> size, but that's just kicking the ball down the road a bit).
>
> > Seems that BDR global sequences will not be good for master-master
failover.
>
> It's fine with more nodes. You have bigger worries, though, due to the
> *asynchronous* nature of the replication. You don't know if the peer
> node(s) have received all the changes from the master that failed. Not
> only that, but if it comes back online later, it'll replay those
> changes, and they might get discarded if more recent updates have
> since been applied to those rows, resulting in lost updates. See the
> documentation on multi-master conflicts and last-update-wins.
>
> This is very good behaviour for append-mostly applications, apps that
> are designed to work well with last-update-wins resolution, etc. It's
> really not what you want for some apps, though, and is extremely bad
> for a few workloads like apps that try to generate gapless sequences
> using counter tables. You *must* review the application if you're
> going to deploy it against a BDR system ... or any other asynchronous
> replication based solution.
>
> You can't just deploy a multi-master system like this and treat it as
> a single node. The very design choices that make it tolerant of
> latency and network partitions also means you have to think much more
> about how the application interacts with the system.
>
> With normal streaming replication you can make it synchronous, so
> there's no such concern. Or you can use it asynchronously, and accept
> that you'll lose some transactions, but you'll at least know (if you
> monitor replica lag) how big a time window you lose, and on failover
> you'll be making the decision to discard those transactions. There
> are no multi-master conflicts to be concerned with, and failover
> becomes a simple (albeit painful) known quantity.
>
> > So, when you consumed the preallocated chunk (default to 15000 values),
your
> > surviving machine will no more be able to insert into a table with a
serial
> > column with underlying BDR global sequence.
> Correct.
>
> If you don't mind being tied to a fixed limit on the number of nodes
> you can instead use step/offset local sequences.
>
> > We're back to changing the start and increment of each sequence that
underly
> > the "serial" field in each table.
> > And must do so differently for each node (only two in a master-master
> > failover).
>
> Correct.
>
> > Is there any workaround?
>
> Keep it simple. Use streaming replication and a hot standby.
>
> > For "traditional" (non BDR) serial, there is a way to set into
configuration
> > what will be START and INCREMENT of all sequences?
>
> No.
>
> > Or each serial sequence must be individually ALTERed for each serial
column
> > in each table?
>
> Yes.
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Melvin Davidson 2015-09-07 13:37:48 Re: table dependencies
Previous Message Jayadevan M 2015-09-07 12:55:32 Partitioning and constraint exclusion