Multi master disjoint cluster

From: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Multi master disjoint cluster
Date: 2022-10-26 16:34:16
Message-ID: CAM+6J95-vXzF6v55cSXaKMVBrCdAc3RVqKEb1Aau+6KZfc8Hqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all,

tl;dr
I have a simple question,
given a choice if I can write the same data to two databases in parallel,
should I opt for primary / replica setup or multi writer/master setup. This
setup has the ability to make use of kafka consumer groups (like two
replication slots each having their own lsn offsets) to write to both db
node pairs in parallel via the application layer.

The churn of data is really high, there is a lot of wals generated, around
500gb/hr.

If I go with primary/replica, (lr not feasible)

I need to ensure both are on the same major version. Upgrades are tricky
(we don't have qa) so we just have option to schema dump for upgrade
compatibility. Data, we trust postgresql for that :). (I wish we had zfs
everywhere but no )

Any excl table blocking operations, (although with later versions there are
very less blocking operations) can impact queries on replica as well
(excluding delay settings).

Corruption can cause downtime (we have tons of them and raids to protect
them) so if replica is having issues, we can zero the pages on the replica
and do some operations if we isolate the problem pages, else resync the
replica from primary. But if primary is having some issues, we copy data
from replica to disk and copy in to primary after truncating etc. Some
downtime but not a lot. (I am not expert at data recovery) and mostly rely
on amcheck, dd, and raid checks.

We don't use pitr (too many wals × 58) or delayed replication as we can't
afford more servers.

ddl deploys are guaranteed by replication. So no need to try 2pc like stuff
at app layer. (Although apps use deploy tools to ensure eventually the ddls
are consistent and idempotent)

Basically primary/replica relieves the app to think what is there on the
primary is also on the replica eventually, so there can be source of truth.

But with multi writers, any app mishandling like bug in catching exception
etc can result in diversion and no more mirrored setup.
We need to have checks/reconciliation to ensure both write nodes in pair
have almost similar data at the end of the day so we can trust this setup
independent of any app mistakes.

But if app layer gets robust, we have almost no downtime in reads and
writes, we can have both nodes on different versions, (w/o logical
replication) can query both nodes real time, no real replication lag issues
, conflicts etc, can upgrade like blue green, canary test some changes on
one if needed etc.

Am I making sense at all? Or I am sounding confused, and I don't know the
difference between primary/replica vs multi writer. This is not bdr like
thing, they don't really need each other unless we are into some recovery.

My point is, we have 58 such primary/replica shards (each 10tb+)
(consistent hashing at app layer, no fdw) and there is no scope of
downtime for reads, so any issue like post upgrade performance degradation
(if any) gives me chills. and we have no qa to test real data.

There are too many dimensions to shard on and aggregations need to run
across the shards (Yes there is no scope of data isolation).

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vijaykumar Jain 2022-10-26 17:04:13 Re: Multi master disjoint cluster
Previous Message Laurenz Albe 2022-10-26 15:35:06 Re: How to know how much CPU, RAM is used by existing 1 database