Re: BDR OR MASTER SLAVE REPLICATION WITH REPMGR IS SUITABLE FOR LARGE ONLINE APPLICATION

From: "Day, Joseph" <jday(at)gisolutions(dot)us>
To: Alvaro Aguayo Garcia-Rada <aaguayo(at)opensysperu(dot)com>
Cc: soumitra bhandary <soumitra(dot)bhandary(at)hotmail(dot)com>, Rijo Roy <royvk6644(at)gmail(dot)com>, pgsql-admin(at)lists(dot)postgresql(dot)org, Prince Pathria <prince(dot)pathria(at)goevive(dot)com>, bdr-list(at)2ndquadrant(dot)com
Subject: Re: BDR OR MASTER SLAVE REPLICATION WITH REPMGR IS SUITABLE FOR LARGE ONLINE APPLICATION
Date: 2019-04-12 05:22:47
Message-ID: CAAFCUrUSfmYzWzK+F8dzhyGyN6EJmJ6u8D5AG_CVAwjTmY-EMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

:(
On Fri, Apr 12, 2019, 12:22 AM Alvaro Aguayo Garcia-Rada <
aaguayo(at)opensysperu(dot)com> wrote:

> Hi. Some comments on your questions. Hope it helps.
>
> 1. I had this problem many times in the past. It's not related to the log
> size, but to the wal segment size, apparently. More exactly, I experienced
> that problem each time I had a LARGE transaction(about 10M updates on one
> table an 60M-100M on another one, all in a single transaction). My solution
> was to modify my application in order to use smaller transactions(BTW, that
> large transaction was a bug anyway).
>
> Recently, I experienced a network outage between my 2 db nodes for about 6
> hours. When connection came back, both nodes started to replicate with no
> problem(Got lots of conflict resolution messages).
>
> So, I insist, this is probably more related to transaction size, not log
> size. An should not be a problem arising from network outage.
>
>
> 2. When the problem happened, I used bdr_init_copy. I consider it the
> fastest way. But that will only do if you can put down one of your database
> servers for a while(exact time depends on database size and network speed).
> If not, you would probably prefer to use bdr_group_join function, which can
> be run with postgres running on destination node. Also, while
> bdr_group_join function joins only a specific database to replication,
> bdr_init_copy setup replication for all bdr-enabled database on origin node.
>
>
> 3. BDR does replicate DDL. It has some restrictions(
> http://bdr-project.org/docs/stable/ddl-replication-statements.html), but
> should not be a problem. However, I would not recommend to use DDL on your
> application. First, because DDL replication requires full database lock,
> which will cause all transactions to wait until the DDL replication is
> completed on all nodes. Second, because, under some circumstances(still not
> have it really clear), DDL replication will hang up, and you will have to
> restart some of the nodes in order to be able to replicate DDL again. I
> recommend DDL replication only for sysadmin usage, and with caution. Also,
> because BDR is asynchronous, if you are about to run a sequence like this
> one:
>
> UPDATE tab SET col = 0;
> ALTER TABLE tab ALTER COLUMN col SET NOT NULL;
>
> You must wait a while before running the ALTER, because it could
> happen(has happened to me) that the ALTER starts when the UPDATE has not
> yet been replicated; that would cause a lock where postgres on any other
> node will try to replicate that DDL and will fail forever because the
> column still contains NULL values. The only solution I found was to run
> postgres in single user mode, and that is really dangerous if you are using
> BDR.
>
>
> 4. Your first case may not happen, because BDR will replicate transaction
> in order. For the second one, I've never ran into any similar case.
>
>
> In brief, while BDR is powerful and really useful, it's a little bit
> dangerous. "Use with caution". Also, you will probably have to tune your
> application so you won't run into BDR-specific problems.
>
> BTW, there is no exactly a "licensed version" and a "free/comunity
> version". 1.x is available freely, while 2.x was made available only to
> 2ndQuadrant customers. 3.x is currently on the same state, but it is
> supposed to be made publicly available, if I'm not wrong.
>
> Regards,
>
> Alvaro Aguayo
> Operations Manager
> Open Comb Systems E.I.R.L.
>
> (+51-1) 337-7813 Ext. 4002
> www.ocs.pe
>
> ----- Original Message -----
> From: "soumitra bhandary" <soumitra(dot)bhandary(at)hotmail(dot)com>
> To: "Rijo Roy" <royvk6644(at)gmail(dot)com>, pgsql-admin(at)lists(dot)postgresql(dot)org,
> "Prince Pathria" <prince(dot)pathria(at)goevive(dot)com>, bdr-list(at)2ndquadrant(dot)com
> Sent: Thursday, 11 April, 2019 21:50:50
> Subject: Re: BDR OR MASTER SLAVE REPLICATION WITH REPMGR IS SUITABLE FOR
> LARGE ONLINE APPLICATION
>
> Thanks for the response .
>
> Here is my observation / queries about BDR , correct me if I am missing
> anything . Any other issue with BDR if you guys faced let me know .
>
>
> 1. Due to any reason if replication is failing and replication log byte
> size reaches more than 1GB . Then corresponding replication slot become
> unresponsive and need to set up BDR again.
> 2. Due to any issue (may be network failure or any key conflicts) if
> replication is stopped or failing then apart from re initializing the
> cluster (using bdr_init_copy<
> http://bdr-project.org/docs/stable/command-bdr-init-copy.html> ) any
> other approach is there to revive the replication ? As bdr.bdr_group_join<
> http://bdr-project.org/docs/stable/functions-node-mgmt.html#FUNCTION-BDR-GROUP-JOIN>
> does not work well for large databases for more than 100 GB .
> 3. For a large OLTP application does BDR replicates the changes made to
> procedure, function ,triggers , views definition, across the cluster if I
> set the replication at DB level ? (I am not sure as not tested)
> 4. Though BDR tries to resolve conflicts using a simple
> last-update-wins strategy, but still conflicts as mentioned below are very
> common in a production instance . Are these taken care in the licensed
> version ?
>
> ERROR: insert or update on table "AAAAA" violates foreign key constraint
> "fk_traf444kk6qrkms7n56aiwq5y"
>
>
> Violating foreign key constraint ,whereas other node is having record.
>
>
> ERROR: multiple unique constraints violated by remotely Inserted tuple
>
>
> An INSERT/INSERT conflict can violate more than one UNIQUE constraint (of
> which one might be the PRIMARY KEY).
>
>
> 1. Back up , recovery and PITR process . As per the documentation ,
> Logical replication cannot be used for point-in-time recovery . So what is
> the extent of this statement ? Does it imply even we run the DBs in
> archival mode and use pg_basebackup to take file level back up in a running
> instance?
>
> Thanks ,
> Soumitra
>
> ________________________________
> From: Rijo Roy <royvk6644(at)gmail(dot)com>
> Sent: Friday, April 12, 2019 7:42 AM
> To: soumitra bhandary
> Cc: Prince Pathria; pgsql-admin(at)lists(dot)postgresql(dot)org
> Subject: Re: BDR OR MASTER SLAVE REPLICATION WITH REPMGR IS SUITABLE FOR
> LARGE ONLINE APPLICATION
>
> BDR is best when you have a need of multi-master environment like a
> distributed database paradigm where you have a need of writing/reading
> to/from local masters let's say masters in NY, London, China, India etc..
> And it uses pglogical to achieve this with a globally managed transaction
> XID's. It needs to be properly tested how it handles conflicts in a heavy
> oltp workload application.
> If you only need high availability for a heavy oltp application, a well
> set active passive model of streaking replication backed by repmgr where
> writes are only hitting primary and reads on standby will help. Otherwise
> you can also consider citusdata..
>
> Note: I am not an advocate of any vendors, my comments are purely based on
> my experience and I love Postgresql the way it is!
>
> Regards,
> Rijo Roy
>
>
> On Fri, 12 Apr 2019, 6:34 am soumitra bhandary, <
> soumitra(dot)bhandary(at)hotmail(dot)com<mailto:soumitra(dot)bhandary(at)hotmail(dot)com>>
> wrote:
> To ensure zero down time.
>
> Sent from my iPhone
>
> On 12-Apr-2019, at 12:35 AM, Prince Pathria <prince(dot)pathria(at)goevive(dot)com
> <mailto:prince(dot)pathria(at)goevive(dot)com>> wrote:
>
> It very much depends upon your use case. Why you want a multi master
> implementation?
>
> Happy to help :)
> Prince Pathria Systems Engineer Evive +91 9478670472 goevive.com<
> http://goevive.com>
>
>
> On Thu, Apr 11, 2019 at 8:03 PM soumitra bhandary <
> soumitra(dot)bhandary(at)hotmail(dot)com<mailto:soumitra(dot)bhandary(at)hotmail(dot)com>>
> wrote:
> Hi ,
>
> Can anyone suggest me for large OLTP application which sort of replication
> is suitable BDR or master slave replication with synchronous and
> asynchronous node and REPMGR enabled.
> Can BDR handle real-time conflicts arises due to high network latency or
> Dataintegrity conflicts?? Please suggest
>
> Sent from my iPhone
>
>
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Pierre Ochsenbein 2019-04-12 07:57:17 LDAP security connection
Previous Message Alvaro Aguayo Garcia-Rada 2019-04-12 05:22:28 Re: BDR OR MASTER SLAVE REPLICATION WITH REPMGR IS SUITABLE FOR LARGE ONLINE APPLICATION