Re: avoiding split brain with repmgr

From: Marc Mamin <M(dot)Mamin(at)intershop(dot)de>
To: Aleksander Kamenik <aleksander(dot)kamenik(at)gmail(dot)com>, "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: avoiding split brain with repmgr
Date: 2017-08-15 07:31:25
Message-ID: B6F6FD62F2624C4C9916AC0175D56D88594551B3@jenmbs01.ad.intershop.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

>I finally found this document NOT referenced from the main README file in the repmgr repo.
>
>https://github.com/2ndQuadrant/repmgr/blob/master/docs/repmgrd-node-fencing.md
>
>I guess the default solution is pgbouncer

Hello,
I'm not sure that any solution can be considered as standard, but we did implement such a solution with pgbouncer.
The script in the linked reference seems somewhat dangerous to me as it first reconfigure pgbouncer and then promote.
This is not safe if the postgres nodes were to suffer a brain split.

In our case we used following sequence:
- stop pgbouncer
- promote
- reconfigure and restart pgbouncer

This same sequence can be used for a manual switchover.

regards,

Marc Mamin

>
>Any simpler solutions for this tricky problem?
>
>Regards,
>
>Aleksander
>
>On Mon, Aug 14, 2017 at 5:03 PM, Aleksander Kamenik <aleksander(dot)kamenik(at)gmail(dot)com> wrote:
>> Hi!
>>
>> In a cluster set up with postgres 9.6, streaming replication and
>> repmgr I'm struggling to find a good/simple solution for avoiding
>> split brain.
>>
>> The current theoretical setup consists of 4 nodes across two data
>> centers. The master node is setup with 1 of 3 synchronous replication.
>> That is it waits for at least one other node to COMMIT as well.
>> repmgrd is installed on every node.
>>
>> The clients will use postgresql JDBC with targetServerType=master so
>> they connect only to the master server in a list of four hosts.
>>
>> The split brain scenario I forsee is when the master node locks up or
>> is isolated for a while and comes back online after repmgrd on other
>> nodes have elected a new master.
>>
>> As the original master node has a requirement of one synced
>> replication node and the remaining two standbys are streaming from the
>> new master it will fortunately not start writing a separate timeline,
>> but will still serve dated read only queries. For writes it will
>> accept connections which hang. The repmgrd instance on the original
>> master sees no problem either so does nothing.
>>
>> Ideally though this instance should be shut down as it has no slaves
>> attached and the status on other nodes indicates this master node is
>> failed.
>>
>> Any suggestions? I'm trying to keep the setup simple without a central
>> pgbouncer/pgpool. Any simple way to avoid a central connection point
>> or custom monitoring script that looks for exactly this issue?
>>
>> Also, do you see any other potential pitfalls in this setup?
>>
>> Thanks for thinking this through,
>>
>> Aleksander
>>
>> --
>> Aleksander Kamenik
>
>
>
>--
>Aleksander Kamenik
>
>
>--
>Sent via pgsql-admin mailing list (pgsql-admin(at)postgresql(dot)org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-admin
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Aleksander Kamenik 2017-08-15 12:40:39 Re: avoiding split brain with repmgr
Previous Message Martin Goodson 2017-08-15 07:28:24 Re: avoiding split brain with repmgr