From: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
---|---|
To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Konstantin Osipov <kostja(dot)osipov(at)gmail(dot)com>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>, Nikolay Samokhvalov <nik(at)postgres(dot)ai>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Built-in Raft replication |
Date: | 2025-04-16 04:58:55 |
Message-ID: | 212D5973-FDD0-4CF5-BCD0-2760EC319DF3@yandex-team.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On 16 Apr 2025, at 09:33, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> In my experience, the load of managing hundreds of replicas which all
> participate in RAFT protocol becomes more than regular transaction
> load. So making every replica a RAFT participant will affect the
> ability to deploy hundreds of replica.
No need to make all standbys voting. And no need to make plain topology. pg_consul is using 2/3 or 3/5 HA groups, and cascades all others from HA group.
Existing tools already solve the original problem, Konstantin is just proposing to solve it in some standard “official” way.
> We may build an extension which
> has a similar role in PostgreSQL world as zookeeper in Hadoop.
Patroni, pg_consul and others already use zookeeper, etcd and similar systems for consensus.
Is it any better as extension than as etcd?
> It can
> be then used for other distributed systems as well - like shared
> nothing clusters based on FDW.
I didn’t get FDW analogy. Why other distributed systems should choose Postgres extension over Zookeeper?
> There's already a proposal to bring
> CREATE SERVER to the world of logical replication - so I see these two
> worlds uniting in future.
Again, I’m lost here. Which two worlds?
> The way I imagine it is some PostgreSQL
> instances, which have this extension installed, will act as a RAFT
> cluster (similar to Zookeeper ensemble or etcd cluster).
That’s exactly what is proposed here.
> The
> distributed system based on logical replication or FDW or both will
> use this ensemble to manage its shared state. The same ensemble can be
> shared across multiple distributed clusters if it has scaling
> capabilities.
Yes, shared DCS are common these days. AFAIK, we use one Zookeeper instance per hundred Postgres clusters to coordinate pg_consuls.
Actually, scalability is opposite to topic of this thread. Let me explain.
Currently, Postgres automatic failover tools rely on databases with built-in automatic failover. Konstantin is proposing to shorten this loop and make Postgres use its build-in automatic failover.
So, existing tooling allows you to have 3 hosts for DCS, with majority of 2 hosts able to elect new leader in case of failover.
And you can have only 2 hosts for Postgres - Primary and Standby. You can have 2 big Postgres machines with 64 CPUs. And 3 one-CPU hosts for Zookeper\etcd.
If you use build-in failover you have to resort to 3 big Postgres machines because you need 2/3 majority. Of course, you can install MySQL-stype arbiter - host that had no real PGDATA, only participates in voting. But this is a solution to problem induced by built-in autofailover.
Best regards, Andrey Borodin.
From | Date | Subject | |
---|---|---|---|
Next Message | shveta malik | 2025-04-16 05:00:15 | Re: Conflict detection for update_deleted in logical replication |
Previous Message | Ashutosh Bapat | 2025-04-16 04:33:15 | Re: Built-in Raft replication |