Quick Links

Re: Built-in Raft replication

From:	Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Konstantin Osipov <kostja(dot)osipov(at)gmail(dot)com>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>, Nikolay Samokhvalov <nik(at)postgres(dot)ai>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Built-in Raft replication
Date:	2025-04-16 06:18:05
Message-ID:	CAExHW5tq8ShEfboFa52wDMWmyaW+6kX_6x8MZ-0TP65tOhJ1wQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Apr 16, 2025 at 10:29 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
> > We may build an extension which
> > has a similar role in PostgreSQL world as zookeeper in Hadoop.
>
> Patroni, pg_consul and others already use zookeeper, etcd and similar systems for consensus.
> Is it any better as extension than as etcd?

I feel so. An extension runs from within a postgresql process, uses
the same protocol as PostgreSQL whereas etcd is another process and
another protocol.

>
> > It can
> > be then used for other distributed systems as well - like shared
> > nothing clusters based on FDW.
>
> I didn’t get FDW analogy. Why other distributed systems should choose Postgres extension over Zookeeper?

By other distributed systems I mean PostgreSQL distributed systems -
FDW based native sharding or native replication or a system which uses
both.

>
> > There's already a proposal to bring
> > CREATE SERVER to the world of logical replication - so I see these two
> > worlds uniting in future.
>
> Again, I’m lost here. Which two worlds?

Logical replication and FDW based native sharding.

>
> > The
> > distributed system based on logical replication or FDW or both will
> > use this ensemble to manage its shared state. The same ensemble can be
> > shared across multiple distributed clusters if it has scaling
> > capabilities.
>
> Yes, shared DCS are common these days. AFAIK, we use one Zookeeper instance per hundred Postgres clusters to coordinate pg_consuls.
>
> Actually, scalability is opposite to topic of this thread. Let me explain.
> Currently, Postgres automatic failover tools rely on databases with built-in automatic failover. Konstantin is proposing to shorten this loop and make Postgres use its build-in automatic failover.
>
> So, existing tooling allows you to have 3 hosts for DCS, with majority of 2 hosts able to elect new leader in case of failover.
> And you can have only 2 hosts for Postgres - Primary and Standby. You can have 2 big Postgres machines with 64 CPUs. And 3 one-CPU hosts for Zookeper\etcd.
>
> If you use build-in failover you have to resort to 3 big Postgres machines because you need 2/3 majority. Of course, you can install MySQL-stype arbiter - host that had no real PGDATA, only participates in voting. But this is a solution to problem induced by built-in autofailover.

Users find it a waste of resources to deploy 3 big PostgreSQL
instances just for HA where 2 suffice even if they deploy 3
lightweight DCS instances. Having only some of the nodes act as DCS
and others purely PostgreSQL nodes will reduce waste of resources.

--
Best Wishes,
Ashutosh Bapat

In response to

Re: Built-in Raft replication at 2025-04-16 04:58:55 from Andrey Borodin

Responses

Re: Built-in Raft replication at 2025-04-16 06:27:13 from Andrey Borodin
Re: Built-in Raft replication at 2025-04-16 12:53:23 from Alastair Turner
Re: Built-in Raft replication at 2025-04-16 19:29:05 from Greg Sabino Mullane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrey Borodin	2025-04-16 06:27:13	Re: Built-in Raft replication
Previous Message	Jelte Fennema-Nio	2025-04-16 06:11:37	Re: dispchar for oauth_client_secret